[jira] Commented: (HADOOP-2574) bugs in mapred tutorial

Phu Hoang (JIRA) Thu, 10 Jan 2008 14:48:57 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557830#action_12557830
 ]


Phu Hoang commented on HADOOP-2574:
-----------------------------------

On WordCount v1.0  above, there is also another bug:
line 3 should be import java.io.*;    instead of import java.io.Exception;

On WordCount v2.0, where local cache files are used, there are also bugs:

1. line 108 and 109 should be: 
    conf.setInputPath(new Path(other_args.get(0)));
    conf.setOutputPath(new Path(other_args.get(1)));

2. If I run the program without using the -skip argument, as in:
~/Hadoop/bin/hadoop jar ~phu/Hadoop/Examples/WordCount2/wordcount.jar 
org.myorg.WordCount -Dwordcount.case.sensitive=false WordCount2/input 
WordCount2/output, I get the following error message:

java.lang.NullPointerException
        at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:197)
        at 
org.apache.hadoop.filecache.DistributedCache.getLocalCacheFiles(DistributedCache.java:470)
        at org.myorg.WordCount$MapClass.configure(WordCount.java:33)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
        at org.myorg.WordCount.run(WordCount.java:110)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.myorg.WordCount.main(WordCount.java:115)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

Looking at DistributedCache.java, line 470, we see:
   return StringUtils.stringToPath(conf.getStrings("mapred.cache.localFiles"));

conf.getStrings is returning NULL.  So somehow we have to initialize this so 
that it does not throw an exception when the -skip argument is not used.

When I do put in the -skip patterns.txt file, the program works.

Lastly,
WordCount v1.0 works even if I do not use DFS, and just access local input and 
output files.  WordCount v2.0 does not work if I do not use DFS.   The 
quickstart tutorial does not make it clear which examples work under which 
scenarios (Stand alone, Pseudo-Distributed, or Fully-Distributed).  One could 
be mistaken to thinking that all examples work under all scenarios.

Phu




> bugs in mapred tutorial
> -----------------------
>
>                 Key: HADOOP-2574
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2574
>             Project: Hadoop
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Doug Cutting
>             Fix For: 0.15.3, 0.16.0
>
>
> Sam Pullara sends me:
> {noformat}
> Phu was going through the WordCount example... lines 52 and 53 should have 
> args[0] and args[1]:
> http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html
> The javac and jar command are also wrong, they don't include the directories 
> for the packages, should be:
> $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d 
> classes WordCount.java 
> $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes .
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2574) bugs in mapred tutorial

Reply via email to