[ https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557830#action_12557830 ]
Phu Hoang commented on HADOOP-2574: ----------------------------------- On WordCount v1.0 above, there is also another bug: line 3 should be import java.io.*; instead of import java.io.Exception; On WordCount v2.0, where local cache files are used, there are also bugs: 1. line 108 and 109 should be: conf.setInputPath(new Path(other_args.get(0))); conf.setOutputPath(new Path(other_args.get(1))); 2. If I run the program without using the -skip argument, as in: ~/Hadoop/bin/hadoop jar ~phu/Hadoop/Examples/WordCount2/wordcount.jar org.myorg.WordCount -Dwordcount.case.sensitive=false WordCount2/input WordCount2/output, I get the following error message: java.lang.NullPointerException at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:197) at org.apache.hadoop.filecache.DistributedCache.getLocalCacheFiles(DistributedCache.java:470) at org.myorg.WordCount$MapClass.configure(WordCount.java:33) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831) at org.myorg.WordCount.run(WordCount.java:110) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.myorg.WordCount.main(WordCount.java:115) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) Looking at DistributedCache.java, line 470, we see: return StringUtils.stringToPath(conf.getStrings("mapred.cache.localFiles")); conf.getStrings is returning NULL. So somehow we have to initialize this so that it does not throw an exception when the -skip argument is not used. When I do put in the -skip patterns.txt file, the program works. Lastly, WordCount v1.0 works even if I do not use DFS, and just access local input and output files. WordCount v2.0 does not work if I do not use DFS. The quickstart tutorial does not make it clear which examples work under which scenarios (Stand alone, Pseudo-Distributed, or Fully-Distributed). One could be mistaken to thinking that all examples work under all scenarios. Phu > bugs in mapred tutorial > ----------------------- > > Key: HADOOP-2574 > URL: https://issues.apache.org/jira/browse/HADOOP-2574 > Project: Hadoop > Issue Type: Bug > Components: documentation > Reporter: Doug Cutting > Fix For: 0.15.3, 0.16.0 > > > Sam Pullara sends me: > {noformat} > Phu was going through the WordCount example... lines 52 and 53 should have > args[0] and args[1]: > http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html > The javac and jar command are also wrong, they don't include the directories > for the packages, should be: > $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d > classes WordCount.java > $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes . > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.