[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050054#comment-14050054
 ] 

Sangjin Lee commented on MAPREDUCE-5957:
----------------------------------------

The gist of this issue is regarding the use of Configuration.getClass() and the 
use of the thread context classloader (TCCL). Currently 
MRApps.setJobClassLoader() sets both the configuration classloader and the TCCL 
at the same time. So once setJobClassLoader() is called, it is made available 
in both contexts.

MAPREDUCE-5751 was caused because the job classloader was made available *too 
early as the TCCL*. This issue is caused because the job classloader is made 
available *too late as the configuration classloader*.

The normal classloading scheme (one class initializing another class via normal 
use or even Class.forName) is unaffected by this if my understanding is correct.

I see two possible approaches for this:
(1) separate the timing of setting the job classloader as the configuration 
classloader and the TCCL
I think while setting the TCCL should be delayed as much as possible (i.e. the 
current timing), the job classloader can be installed as the configuration 
classloader much earlier. If the configuration loads a user class, that's 
precisely what we need. If it loads a system class, the job classloader will 
delegate anyhow. I don't think there is harm in setting the configuration 
classloader early.

(2) set and unset the job classloader around the code that loads classes from 
the configuration
Identify the code points in MRAppMaster where Configuration.getClass() is 
needed, and set and unset the job classloader around them. Although this would 
also solve this problem, the downside is that one needs to make a determination 
that the job classloader is needed and set/unset it. This is potentially 
brittle.

I think (1) is a more robust solution to this problem. Do you see an issue with 
taking that approach?

I don't think the task (YarnChild) is affected by this.

> AM throws ClassNotFoundException with job classloader enabled if custom 
> output format/committer is used
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5957
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>
> With the job classloader enabled, the MR AM throws ClassNotFoundException if 
> a custom output format class is specified.
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> com.foo.test.TestOutputFormat not found
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class com.foo.test.TestOutputFormat not found
>       at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
>       at 
> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
>       ... 8 more
> Caused by: java.lang.ClassNotFoundException: Class 
> com.foo.test.TestOutputFormat not found
>       at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
>       at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
>       ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to