On Tue, 31 May 2011 15:09:28 -0400, John Armstrong <john.armstr...@ccri.com> wrote: > On Tue, 31 May 2011 12:02:28 -0700, Alejandro Abdelnur <t...@cloudera.com> > wrote: >> What is exactly that does not work?
In the hopes that more information can help, I've dug into the local filesystems on each of my four nodes and retrieved the job.xml and the locations of the files to show that everything shows up where it should. In this example have one regular file (hdfs://node1:hdfsport/hdfs/path/to/file1.foo) added with DistributedCache.addCacheFile(). I also have a JAR (hdfs://node1:hdfsport/hdfs/path/to/needed.jar) added with DistributedCache.addFileToClassPath(). The needed JAR is also part of the classpath Oozie provides to my Java task. As you can see, both files (with correct filesizes and timestamps) are listed as cache files in job.xml, and the JAR is listed as a classpath file. Both files show up on each node; the JAR shows up twice on node 1 since that's where Oozie ran the Java task, and thus where Oozie placed the JAR with its own use of the distributed cache. And yet, when mapreduce actually tries to run the job my Java task launches, it immediately hits a ClassNotFoundException, claiming it can't find the class my.class.package.Needed which is contained in needed.jar. JOB.XML ... <property> <!--Loaded from Unknown--> <name>mapred.job.classpath.files</name> <value>hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value> </property> ... <property> <!--Loaded from Unknown--> <name>mapred.cache.files</name> <value>hdfs://node1:hdfsport/hdfs/path/to/file1.foo,hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value> </property> ... <property> <!--Loaded from Unknown--> <name>mapred.cache.files.filesizes</name> <value>61175,2257057</value> </property> ... <property> <!--Loaded from Unknown--> <name>mapred.cache.files.timestamps</name> <value>1306949104866,1306949371660</value> </property> ... NODE 1 LOCAL FILESYSTEM /data/4/mapred/local/taskTracker/distcache/5181540010607464671_-132008737_1279047490/node1/hdfs/path/to/file1.foo /data/1/mapred/local/taskTracker/distcache/6423795395825083633_-1942178119_1279314284/node1/hdfs/path/to/needed.jar /data/3/mapred/local/taskTracker/distcache/2424191142954514770_1281905983_1269665052/node1/hdfs/path/to/needed.jar NODE 2 LOCAL FILESYSTEM /data/1/mapred/local/taskTracker/distcache/-1458632814086969626_-132008737_1279047490/node1/hdfs/path/to/file1.foo /data/2/mapred/local/taskTracker/distcache/4434671176913378591_-1942178119_1279314284/node1/hdfs/path/to/needed.jar NODE 3 LOCAL FILESYSTEM /data/1/mapred/local/taskTracker/distcache/-6763452370915390695_-132008737_1279047490/node1/hdfs/path/to/file1.foo /data/2/mapred/local/taskTracker/distcache/6838381597046551111_-1942178119_1279314284/node1/hdfs/path/to/needed.jar NODE 4 LOCAL FILESYSTEM /data/1/mapred/local/taskTracker/distcache/-1759547009148985681_-132008737_1279047490/node1/hdfs/path/to/file1.foo /data/2/mapred/local/taskTracker/distcache/1998811135309473771_-1942178119_1279314284/node1/hdfs/path/to/needed.jar SAMPLE MAPPER ATTEMPT LOG 2011-06-01 14:21:41,442 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2011-06-01 14:21:41,557 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/job.jar <- /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./job.jar 2011-06-01 14:21:41,560 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/.job.jar.crc <- /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./.job.jar.crc 2011-06-01 14:21:41,563 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-06-01 14:21:41,660 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: java.lang.ClassNotFoundException: my.class.package.Needed at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:973) at org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:236) at org.apache.hadoop.mapred.Task.initialize(Task.java:484) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:298) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: java.lang.ClassNotFoundException: my.class.package.Needed at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:920) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:971) ... 8 more