Hello Everyone, I have a particular situation, where I am trying to run Iterative Map-Reduce, where the output files for one iteration are the input files for the next. It stops when there are no new files created in the output.
*Code Snippet:* *int round = 0;* *JobConf jobconf = new JobConf(new Configuration(), MyClass.class);* *do {* *String old_path = "path_" + Integer.toString(round);* *round = round + 1;* *String new_path = "path" + Integer.toString(round);* *FileInputFormat.addInputPath ( jobconf, new Path (old_file) ); * *FileInputFormat.setInputPath ( jobconf, new Path (new_file) ); // These will eventually become directories containing multiple files* *jobconf.setMapperClass(MyMap.class);* *jobconf.setReducerClass(MyReduce.class);* *// Other code* *JobClient.runJob(jobconf);* *FileStatus[] directory = fs.listStatus ( new Path ( new_file ) ); // To check for any new files in the output directory* *} while ( directory.length != 0 ); // Stop iteration only when no new files are generated in the output path* The code runs smoothly in the first round and I can see the new directory path_1 getting created and files added in it from the Reducer output. The original path_0 is created from before by me and I have added relevant files in it. The output files seems to have the correct data as per my Map/Reduce logic. However, in the second round it fails with the following exception. *In 0.19 (In a cloud system - Fully Distributed Mode)* java.lang.IllegalArgumentException: Wrong FS: hdfs://cloud_hostname:9000/hadoop/tmp/hadoop/mapred/system/job_201106271322_9494/job.jar, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:322) *In 0.20.203 (my own system and not a cloud - Pseudo Distributed Mode)* 11/10/12 00:35:42 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0002 Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0001/job.jar, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354) It seems that Hadoop is not being able to delete the staging file for the job. Can you please suggest any reason for this? Please help! Thanks a lot in advance! Warm regards Arko