jar files on NFS instead of DistributedCache

Mikhail Bautin Fri, 18 Apr 2008 15:03:20 -0700

Hello,

We are using Hadoop here at Stony Brook University to power the
next-generation text analytics backend for www.textmap.com.  We also have an
NFS partition that is mounted on all machines of our 100-node cluster.  I
found it much more convenient to store manually created files (e.g.
configuration) on the NFS partition and just use them from my mappers and
reducers rather than copying them to HDFS every time I change them, which is
necessary when using DistributedCache.  Is there a way to do the same for
jars?


Specifically, I just need a way to alter the child JVM's classpath via
JobConf, without having the framework copy anything in and out of HDFS,
because all my files are already accessible from all nodes.  I see how to do
that by adding a couple of lines to TaskRunner's run() method, e.g.:

  classPath.append(sep);
  classPath.append(conf.get("mapred.additional.classpath"));

or something similar.  Is there already such a feature or should I just go
ahead and implement it?

Thanks,

Mikhail Bautin

jar files on NFS instead of DistributedCache

Reply via email to