Iterative MR issue

Arko Provo Mukherjee Tue, 11 Oct 2011 23:55:25 -0700

Hello Everyone,

I have a particular situation, where I am trying to run Iterative
Map-Reduce, where the output files for one iteration are the input files for
the next.
It stops when there are no new files created in the output.


*Code Snippet:*

*int round = 0;*

*JobConf jobconf = new JobConf(new Configuration(), MyClass.class);*

*do  {*

*String old_path = "path_" + Integer.toString(round);*

*round = round + 1;*

*String new_path = "path" + Integer.toString(round);*

*FileInputFormat.addInputPath ( jobconf, new Path (old_file) );  *

*FileInputFormat.setInputPath ( jobconf, new Path (new_file) );   // These
will eventually become directories containing multiple files*

*jobconf.setMapperClass(MyMap.class);*

*jobconf.setReducerClass(MyReduce.class);*

*// Other code*

*JobClient.runJob(jobconf);*

*FileStatus[] directory = fs.listStatus ( new Path ( new_file ) );  // To
check for any new files in the output directory*

*} while ( directory.length != 0 );  // Stop iteration only when no new
files are generated in the output path*


The code runs smoothly in the first round and I can see the new directory
path_1 getting created and files added in it from the Reducer output.

The original path_0 is created from before by me and I have added relevant
files in it.

The output files seems to have the correct data as per my Map/Reduce logic.

However, in the second round it fails with the following exception.

*In 0.19 (In a cloud system - Fully Distributed Mode)*

java.lang.IllegalArgumentException: Wrong FS:
hdfs://cloud_hostname:9000/hadoop/tmp/hadoop/mapred/system/job_201106271322_9494/job.jar,
expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:322)


*In 0.20.203 (my own system and not a cloud - Pseudo Distributed Mode)*

11/10/12 00:35:42 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0002
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0001/job.jar,
expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354)

It seems that Hadoop is not being able to delete the staging file for the
job.

Can you please suggest any reason for this? Please help!

Thanks a lot in advance!

Warm regards
Arko

Iterative MR issue

Reply via email to