Preston Koprivica created MAPREDUCE-6344:
--------------------------------------------

             Summary: Inconsistent classpath/classloading from DistributedCache 
archives
                 Key: MAPREDUCE-6344
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6344
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.5.2, 2.5.1, 2.7.0, 2.6.0, 2.5.0
            Reporter: Preston Koprivica


We recently upgraded to MRv2 on YARN and have been noticing very inconsistent 
classloading between the job submission client and the tasks as they start up. 

I've tracked the issue to this method:

https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L264

It appears that the classpath is simply "wild carded".  According the javase 
7&8 docs, the order of enumeration is not specified and may differ from moment 
to moment [1][2].  This is a problem for applications that rely on strict 
ordering, which the MRv1 DistributedCache used to honor.

I'm unable to track down all the things that are linked or landed into the 
\$PWD of the container, but assuming we can't account for all these things, a 
simple solution could be to explicitly enumerate the files in DistributedCache 
- similar to the "non jar" case [3] - and then add the "*" for passivity.  

[1] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.htm
[2] 
http://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762
[3] 
https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L270




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to