[ 
https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469523
 ] 

Doug Cutting commented on HADOOP-964:
-------------------------------------

This is a serious bug, introduced by the new merge code.  Previously, 
comparators were only used in the child process: ReduceTaskRunner.prepare() 
only copied binary data before, but not it does some sorting too.  Hence, this 
logic should move into the child process.  (Architecturally, the goal is to 
keep user code out of long-running daemon processes.)

I think we should proceed as follows:

1. Add a unit test where the comparator is in the jar file.
2. Make a short-term fix that loads these classes into the TaskTracker.
3. Add another bug to move all comparator access into the child process.



> ClassNotFoundException in ReduceTaskRunner
> ------------------------------------------
>
>                 Key: HADOOP-964
>                 URL: https://issues.apache.org/jira/browse/HADOOP-964
>             Project: Hadoop
>          Issue Type: Bug
>          Components: scripts
>         Environment: windows xp and fedora core 6 linux, java 1.5.10...should 
> affect all systems
>            Reporter: Dennis Kubes
>            Priority: Critical
>             Fix For: 0.11.0
>
>         Attachments: classpath.patch, classpath2.path
>
>
> In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts 
> to get the map output key and value classes from the configuration object.  
> This is before the TaskTracker$Child process is spawned off into into own 
> separate JVM so here the classpath for the configuration is the classpath 
> that started the TaskTracker.  The current hadoop script includes the hadoop 
> jars, meaning that any hadoop writable type will be found, but it doesn't 
> include nutch jars  so any nutch writable type or any other writable type 
> will not be found and will throw a ClassNotFoundException.
> I don't think it is a good idea to have a dependecy on specific Nutch jars in 
> the Hadoop script but it is a good idea to allow jars to be included if they 
> are in specific locations, such as the HADOOP_HOME where the nutch jar 
> resides.  I have attached a patch that adds any jars in the HADOOP_HOME 
> directory to the hadoop classpath.  This fixes the issues with getting 
> ClassNotFoundExceptions inside of Nutch processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to