GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/834
[SPARK-1870][branch-0.9] Jars added by sc.addJar are not in the default classLoader in executor for YARN The summary is copied from Sandy's comment in the mailing list. The relevant difference between YARN and standalone is that, on YARN, the app jar is loaded by the system classloader instead of Spark's custom URL classloader. On YARN, the system classloader knows about [the classes in the spark jars, the classes in the primary app jar]. The custom classloader knows about [the classes in secondary app jars] and has the system classloader as its parent. A few relevant facts (mostly redundant with what Sean pointed out): * Every class has a classloader that loaded it. * When an object of class B is instantiated inside of class A, the classloader used for loading B is the classloader that was used for loading A. * When a classloader fails to load a class, it lets its parent classloader try. If its parent succeeds, its parent becomes the "classloader that loaded it". So suppose class B is in a secondary app jar and class A is in the primary app jar: 1. The custom classloader will try to load class A. 2. It will fail, because it only knows about the secondary jars. 3. It will delegate to its parent, the system classloader. 4. The system classloader will succeed, because it knows about the primary app jar. 5. A's classloader will be the system classloader. 6. A tries to instantiate an instance of class B. 7. B will be loaded with A's classloader, which is the system classloader. 8. Loading B will fail, because A's classloader, which is the system classloader, doesn't know about the secondary app jars. In Spark standalone, A and B are both loaded by the custom classloader, so this issue doesn't come up. In this PR, we don't use customClassLoader anymore. We add URL to the current classloader instead. Since AddURL is protected method in URLClassLoader, calling the protected method is achieved through reflection. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dbtsai/spark branch-0.9-dbtsai-classloader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/834.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #834 ---- commit 474ef2c936b8f659521a519c103bc7fdb116353b Author: DB Tsai <dbt...@alpinenow.com> Date: 2014-05-20T04:34:58Z Fixed the classLoader issue in 0.9 branch. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---