[
https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557830#action_12557830
]
Phu Hoang commented on HADOOP-2574:
-----------------------------------
On WordCount v1.0 above, there is also another bug:
line 3 should be import java.io.*; instead of import java.io.Exception;
On WordCount v2.0, where local cache files are used, there are also bugs:
1. line 108 and 109 should be:
conf.setInputPath(new Path(other_args.get(0)));
conf.setOutputPath(new Path(other_args.get(1)));
2. If I run the program without using the -skip argument, as in:
~/Hadoop/bin/hadoop jar ~phu/Hadoop/Examples/WordCount2/wordcount.jar
org.myorg.WordCount -Dwordcount.case.sensitive=false WordCount2/input
WordCount2/output, I get the following error message:
java.lang.NullPointerException
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:197)
at
org.apache.hadoop.filecache.DistributedCache.getLocalCacheFiles(DistributedCache.java:470)
at org.myorg.WordCount$MapClass.configure(WordCount.java:33)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
at org.myorg.WordCount.run(WordCount.java:110)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.myorg.WordCount.main(WordCount.java:115)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
Looking at DistributedCache.java, line 470, we see:
return StringUtils.stringToPath(conf.getStrings("mapred.cache.localFiles"));
conf.getStrings is returning NULL. So somehow we have to initialize this so
that it does not throw an exception when the -skip argument is not used.
When I do put in the -skip patterns.txt file, the program works.
Lastly,
WordCount v1.0 works even if I do not use DFS, and just access local input and
output files. WordCount v2.0 does not work if I do not use DFS. The
quickstart tutorial does not make it clear which examples work under which
scenarios (Stand alone, Pseudo-Distributed, or Fully-Distributed). One could
be mistaken to thinking that all examples work under all scenarios.
Phu
> bugs in mapred tutorial
> -----------------------
>
> Key: HADOOP-2574
> URL: https://issues.apache.org/jira/browse/HADOOP-2574
> Project: Hadoop
> Issue Type: Bug
> Components: documentation
> Reporter: Doug Cutting
> Fix For: 0.15.3, 0.16.0
>
>
> Sam Pullara sends me:
> {noformat}
> Phu was going through the WordCount example... lines 52 and 53 should have
> args[0] and args[1]:
> http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html
> The javac and jar command are also wrong, they don't include the directories
> for the packages, should be:
> $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d
> classes WordCount.java
> $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes .
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.