Re: Too large class path for map reduce jobs

Henning Blohm Fri, 24 Sep 2010 03:40:59 -0700

Short update on the issue:

I tried to find a way to separate class path configurations by modifying
the scripts in HADOOP_HOME/bin but found that TaskRunner actually copies
the class path setting from the parent process when starting a local
task so that I do not see a way of having less on a job's classpath
without modifying Hadoop.

As that will present a real issue when running our jobs on Hadoop I
would like to propose to change TaskRunner so that it sets a class path
specifically for M/R tasks. That class path could be defined in the
scipts (as for the other processes) using a particular environment
variable (e.g. HADOOP_JOB_CLASSPATH). It could default to the current
VM's class path, preserving today's behavior.

Is it ok to enter this as an issue?

Thanks,
  Henning

Am Freitag, den 17.09.2010, 16:01 +0000 schrieb Allen Wittenauer: 

> On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:
> 
> > When running map reduce tasks in Hadoop I run into classpath issues. 
> > Contrary to previous posts, my problem is not that I am missing classes on 
> > the Task's class path (we have a perfect solution for that) but rather find 
> > too many (e.g. ECJ classes or jetty).
> 
> The fact that you mention:
> 
> > The libs in HADOOP_HOME/lib seem to contain everything needed to run 
> > anything in Hadoop which is, I assume, much more than is needed to run a 
> > map reduce task.
> 
> hints that your perfect solution is to throw all your custom stuff in lib.  
> If so, that's a huge mistake.  Use distributed cache instead.

Re: Too large class path for map reduce jobs

Reply via email to