Hi, With several days investigation, the wordcount-nopipe example in the hadoop-0.19.1 package can be run finally. However, there are some changes I did but not sure if this is proper/correct way. Could anyone please help to verify?
1. start up the job with the -inputformat argument with value "org.apache.hadoop.mapred.pipes.WordCountInputFormat" 2. since the RecordReader/Writer in C++ is used, so no Java based RecordReader/Writer should be used. However, I faced the error like "RecordReader defined while not needed" from the pipes. After checking the org.apache.hadoop.mapred.pipes.Submitter.java, I found this code snippet: if (results.hasOption("-inputformat")) { setIsJavaRecordReader(job, true); job.setInputFormat(getClass(results, "-inputformat", job, InputFormat.class)); } So it seems that with -inputformat specified, the JavaRecordReader will be enabled. This caused the error before. Then I comment the line "setIsJavaRecordReader(job, true);". Then the examples can be run. Is this the proper way to make the wordcount-nopipe example works? I see from the code that there is a commented line "//cli.addArgument("javareader", false, "is the RecordReader in Java");" . Should this line be uncommented to support the disable of JavaRecordReader in command line option? 3. It seems that the wordcount-nopipe only works for input/output with local URI file:///home/... Is this true? Thanks, Jianmin