Short update on the issue: I tried to find a way to separate class path configurations by modifying the scripts in HADOOP_HOME/bin but found that TaskRunner actually copies the class path setting from the parent process when starting a local task so that I do not see a way of having less on a job's classpath without modifying Hadoop.
As that will present a real issue when running our jobs on Hadoop I would like to propose to change TaskRunner so that it sets a class path specifically for M/R tasks. That class path could be defined in the scipts (as for the other processes) using a particular environment variable (e.g. HADOOP_JOB_CLASSPATH). It could default to the current VM's class path, preserving today's behavior. Is it ok to enter this as an issue? Thanks, Henning Am Freitag, den 17.09.2010, 16:01 +0000 schrieb Allen Wittenauer: > On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote: > > > When running map reduce tasks in Hadoop I run into classpath issues. > > Contrary to previous posts, my problem is not that I am missing classes on > > the Task's class path (we have a perfect solution for that) but rather find > > too many (e.g. ECJ classes or jetty). > > The fact that you mention: > > > The libs in HADOOP_HOME/lib seem to contain everything needed to run > > anything in Hadoop which is, I assume, much more than is needed to run a > > map reduce task. > > hints that your perfect solution is to throw all your custom stuff in lib. > If so, that's a huge mistake. Use distributed cache instead.