Having a user add define a custom class inside of an added jar and
instantiate it directly inside of an executor is definitely supported
in Spark and has been for a really long time (several years). This is
something we do all the time in Spark.

DB - I'd hold off on a re-architecting of this until we identify
exactly what is causing the bug you are running into.

In a nutshell, when the bytecode "new Foo()" is run on the executor,
it will ask the driver for the class over HTTP using a custom
classloader. Something in that pipeline is breaking here, possibly
related to the YARN deployment stuff.


On Mon, May 19, 2014 at 12:29 AM, Sean Owen <so...@cloudera.com> wrote:
> I don't think a customer classloader is necessary.
>
> Well, it occurs to me that this is no new problem. Hadoop, Tomcat, etc
> all run custom user code that creates new user objects without
> reflection. I should go see how that's done. Maybe it's totally valid
> to set the thread's context classloader for just this purpose, and I
> am not thinking clearly.
>
> On Mon, May 19, 2014 at 8:26 AM, Andrew Ash <and...@andrewash.com> wrote:
>> Sounds like the problem is that classloaders always look in their parents
>> before themselves, and Spark users want executors to pick up classes from
>> their custom code before the ones in Spark plus its dependencies.
>>
>> Would a custom classloader that delegates to the parent after first
>> checking itself fix this up?
>>
>>
>> On Mon, May 19, 2014 at 12:17 AM, DB Tsai <dbt...@stanford.edu> wrote:
>>
>>> Hi Sean,
>>>
>>> It's true that the issue here is classloader, and due to the classloader
>>> delegation model, users have to use reflection in the executors to pick up
>>> the classloader in order to use those classes added by sc.addJars APIs.
>>> However, it's very inconvenience for users, and not documented in spark.
>>>
>>> I'm working on a patch to solve it by calling the protected method addURL
>>> in URLClassLoader to update the current default classloader, so no
>>> customClassLoader anymore. I wonder if this is an good way to go.
>>>
>>>   private def addURL(url: URL, loader: URLClassLoader){
>>>     try {
>>>       val method: Method =
>>> classOf[URLClassLoader].getDeclaredMethod("addURL", classOf[URL])
>>>       method.setAccessible(true)
>>>       method.invoke(loader, url)
>>>     }
>>>     catch {
>>>       case t: Throwable => {
>>>         throw new IOException("Error, could not add URL to system
>>> classloader")
>>>       }
>>>     }
>>>   }
>>>
>>>
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> -------------------------------------------------------
>>> My Blog: https://www.dbtsai.com
>>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>>
>>>
>>> On Sun, May 18, 2014 at 11:57 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>> > I might be stating the obvious for everyone, but the issue here is not
>>> > reflection or the source of the JAR, but the ClassLoader. The basic
>>> > rules are this.
>>> >
>>> > "new Foo" will use the ClassLoader that defines Foo. This is usually
>>> > the ClassLoader that loaded whatever it is that first referenced Foo
>>> > and caused it to be loaded -- usually the ClassLoader holding your
>>> > other app classes.
>>> >
>>> > ClassLoaders can have a parent-child relationship. ClassLoaders always
>>> > look in their parent before themselves.
>>> >
>>> > (Careful then -- in contexts like Hadoop or Tomcat where your app is
>>> > loaded in a child ClassLoader, and you reference a class that Hadoop
>>> > or Tomcat also has (like a lib class) you will get the container's
>>> > version!)
>>> >
>>> > When you load an external JAR it has a separate ClassLoader which does
>>> > not necessarily bear any relation to the one containing your app
>>> > classes, so yeah it is not generally going to make "new Foo" work.
>>> >
>>> > Reflection lets you pick the ClassLoader, yes.
>>> >
>>> > I would not call setContextClassLoader.
>>> >
>>> > On Mon, May 19, 2014 at 12:00 AM, Sandy Ryza <sandy.r...@cloudera.com>
>>> > wrote:
>>> > > I spoke with DB offline about this a little while ago and he confirmed
>>> > that
>>> > > he was able to access the jar from the driver.
>>> > >
>>> > > The issue appears to be a general Java issue: you can't directly
>>> > > instantiate a class from a dynamically loaded jar.
>>> > >
>>> > > I reproduced it locally outside of Spark with:
>>> > > ---
>>> > >     URLClassLoader urlClassLoader = new URLClassLoader(new URL[] { new
>>> > > File("myotherjar.jar").toURI().toURL() }, null);
>>> > >     Thread.currentThread().setContextClassLoader(urlClassLoader);
>>> > >     MyClassFromMyOtherJar obj = new MyClassFromMyOtherJar();
>>> > > ---
>>> > >
>>> > > I was able to load the class with reflection.
>>> >
>>>

Reply via email to