Re: Adding $CLASSPATH to Map/Reduce tasks
maybe you can use bin/hadoop jar -libjars ${your-depends-jars} your.mapred.jar args see details: http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobShell.html On Thu, Sep 25, 2008 at 12:26 PM, David Hall [EMAIL PROTECTED]wrote: On Sun, Sep 21, 2008 at 9:41 PM, David Hall [EMAIL PROTECTED] wrote: On Sun, Sep 21, 2008 at 9:35 PM, Arun C Murthy [EMAIL PROTECTED] wrote: On Sep 21, 2008, at 2:05 PM, David Hall wrote: (New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs in the forked jvms run by the Map and Reduce tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to disallow this by specifying it's own. Using jars on NFS for too many tasks might hurt if you have thousands of tasks, causing too much load. The better solution might be to use the DistributedCache: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache Specifically: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29 http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29 Arun Good point.. I hadn't thought of that, but at the moment we're dealing with barrier-to-adoption rather than efficiency. We'll have to go back to PBS if we can't get users (read: picky phd students) on board. I'd rather avoid that scenario... In the meantime, I think I figured out a hack that I'm going to try. In case anyone's curious, the hack is to create a jar file with a manifest that has the Class-Path field set to all the directories and jars you want, and to put that in the lib/ folder of another jar, and pass that final jar in as the User Jar to a job. Works like a charm. :-) -- David Thanks! -- David Is there any easy way to trick hadoop into making these visible? If not, if I were to submit a patch that would (optionally) add $CLASSPATH to the forked jvms' classpath, would it be considered? Thanks, David Hall
Re: Adding $CLASSPATH to Map/Reduce tasks
Hi, On Fri, Sep 26, 2008 at 10:50 AM, Samuel Guo [EMAIL PROTECTED] wrote: maybe you can use bin/hadoop jar -libjars ${your-depends-jars} your.mapred.jar args see details: http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobShell.html Indeed, I was having the same issue trying to get a Lucene jar file into a running task. Despite what the docs say, it works with the jar option to the hadoop command. (The docs I read said it only worked with job and a couple other commands; unfortunately I don't have a link to that page at the moment.) Joe
Re: Adding $CLASSPATH to Map/Reduce tasks
On Fri, Sep 26, 2008 at 7:50 AM, Samuel Guo [EMAIL PROTECTED] wrote: maybe you can use bin/hadoop jar -libjars ${your-depends-jars} your.mapred.jar args see details: http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobShell.html Most of our classes are in non-jars. I suppose it wouldn't be too bad to tell ant to jar them up, but with the hack, it's easy enough to not bother. -- David On Thu, Sep 25, 2008 at 12:26 PM, David Hall [EMAIL PROTECTED]wrote: On Sun, Sep 21, 2008 at 9:41 PM, David Hall [EMAIL PROTECTED] wrote: On Sun, Sep 21, 2008 at 9:35 PM, Arun C Murthy [EMAIL PROTECTED] wrote: On Sep 21, 2008, at 2:05 PM, David Hall wrote: (New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs in the forked jvms run by the Map and Reduce tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to disallow this by specifying it's own. Using jars on NFS for too many tasks might hurt if you have thousands of tasks, causing too much load. The better solution might be to use the DistributedCache: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache Specifically: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29 http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29 Arun Good point.. I hadn't thought of that, but at the moment we're dealing with barrier-to-adoption rather than efficiency. We'll have to go back to PBS if we can't get users (read: picky phd students) on board. I'd rather avoid that scenario... In the meantime, I think I figured out a hack that I'm going to try. In case anyone's curious, the hack is to create a jar file with a manifest that has the Class-Path field set to all the directories and jars you want, and to put that in the lib/ folder of another jar, and pass that final jar in as the User Jar to a job. Works like a charm. :-) -- David Thanks! -- David Is there any easy way to trick hadoop into making these visible? If not, if I were to submit a patch that would (optionally) add $CLASSPATH to the forked jvms' classpath, would it be considered? Thanks, David Hall
Adding $CLASSPATH to Map/Reduce tasks
(New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs in the forked jvms run by the Map and Reduce tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to disallow this by specifying it's own. Is there any easy way to trick hadoop into making these visible? If not, if I were to submit a patch that would (optionally) add $CLASSPATH to the forked jvms' classpath, would it be considered? Thanks, David Hall
Re: Adding $CLASSPATH to Map/Reduce tasks
On Sep 21, 2008, at 2:05 PM, David Hall wrote: (New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs in the forked jvms run by the Map and Reduce tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to disallow this by specifying it's own. Using jars on NFS for too many tasks might hurt if you have thousands of tasks, causing too much load. The better solution might be to use the DistributedCache: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache Specifically: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration) http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration) Arun Is there any easy way to trick hadoop into making these visible? If not, if I were to submit a patch that would (optionally) add $CLASSPATH to the forked jvms' classpath, would it be considered? Thanks, David Hall