On Mon, Aug 22, 2011 at 11:31 PM, W.P. McNeill <bill...@gmail.com> wrote:
> What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?
>
> This isn't clear to me from documentation and books, so I did some
> experimenting. Here's the conclusion I came to: the paths in
> HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
> not added to the class path of the Task Trackers. Therefore if you put a JAR
> called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
> available to the Task Trackers as well, calls to MyJar.jar code from the
> run() method of your job work, but calls from your Mapper or Reducer will
> fail at runtime. Is this correct?

Yes, this is right.

> If it is, what is the proper way to make MyJar.jar available to both the Job
> Client and the Task Trackers?

You'll need to use the Distributed Cache. Or you'd need to start the
TaskTrackers with the library on their classpath (which copies over to
launched task JVMs). The latter way is rigid/inflexible when it comes
to jar versioning.

-- 
Harsh J

Reply via email to