Having a user add define a custom class inside of an added jar and instantiate it directly inside of an executor is definitely supported in Spark and has been for a really long time (several years). This is something we do all the time in Spark.
DB - I'd hold off on a re-architecting of this until we identify exactly what is causing the bug you are running into. In a nutshell, when the bytecode "new Foo()" is run on the executor, it will ask the driver for the class over HTTP using a custom classloader. Something in that pipeline is breaking here, possibly related to the YARN deployment stuff. On Mon, May 19, 2014 at 12:29 AM, Sean Owen <[email protected]> wrote: > I don't think a customer classloader is necessary. > > Well, it occurs to me that this is no new problem. Hadoop, Tomcat, etc > all run custom user code that creates new user objects without > reflection. I should go see how that's done. Maybe it's totally valid > to set the thread's context classloader for just this purpose, and I > am not thinking clearly. > > On Mon, May 19, 2014 at 8:26 AM, Andrew Ash <[email protected]> wrote: >> Sounds like the problem is that classloaders always look in their parents >> before themselves, and Spark users want executors to pick up classes from >> their custom code before the ones in Spark plus its dependencies. >> >> Would a custom classloader that delegates to the parent after first >> checking itself fix this up? >> >> >> On Mon, May 19, 2014 at 12:17 AM, DB Tsai <[email protected]> wrote: >> >>> Hi Sean, >>> >>> It's true that the issue here is classloader, and due to the classloader >>> delegation model, users have to use reflection in the executors to pick up >>> the classloader in order to use those classes added by sc.addJars APIs. >>> However, it's very inconvenience for users, and not documented in spark. >>> >>> I'm working on a patch to solve it by calling the protected method addURL >>> in URLClassLoader to update the current default classloader, so no >>> customClassLoader anymore. I wonder if this is an good way to go. >>> >>> private def addURL(url: URL, loader: URLClassLoader){ >>> try { >>> val method: Method = >>> classOf[URLClassLoader].getDeclaredMethod("addURL", classOf[URL]) >>> method.setAccessible(true) >>> method.invoke(loader, url) >>> } >>> catch { >>> case t: Throwable => { >>> throw new IOException("Error, could not add URL to system >>> classloader") >>> } >>> } >>> } >>> >>> >>> >>> Sincerely, >>> >>> DB Tsai >>> ------------------------------------------------------- >>> My Blog: https://www.dbtsai.com >>> LinkedIn: https://www.linkedin.com/in/dbtsai >>> >>> >>> On Sun, May 18, 2014 at 11:57 PM, Sean Owen <[email protected]> wrote: >>> >>> > I might be stating the obvious for everyone, but the issue here is not >>> > reflection or the source of the JAR, but the ClassLoader. The basic >>> > rules are this. >>> > >>> > "new Foo" will use the ClassLoader that defines Foo. This is usually >>> > the ClassLoader that loaded whatever it is that first referenced Foo >>> > and caused it to be loaded -- usually the ClassLoader holding your >>> > other app classes. >>> > >>> > ClassLoaders can have a parent-child relationship. ClassLoaders always >>> > look in their parent before themselves. >>> > >>> > (Careful then -- in contexts like Hadoop or Tomcat where your app is >>> > loaded in a child ClassLoader, and you reference a class that Hadoop >>> > or Tomcat also has (like a lib class) you will get the container's >>> > version!) >>> > >>> > When you load an external JAR it has a separate ClassLoader which does >>> > not necessarily bear any relation to the one containing your app >>> > classes, so yeah it is not generally going to make "new Foo" work. >>> > >>> > Reflection lets you pick the ClassLoader, yes. >>> > >>> > I would not call setContextClassLoader. >>> > >>> > On Mon, May 19, 2014 at 12:00 AM, Sandy Ryza <[email protected]> >>> > wrote: >>> > > I spoke with DB offline about this a little while ago and he confirmed >>> > that >>> > > he was able to access the jar from the driver. >>> > > >>> > > The issue appears to be a general Java issue: you can't directly >>> > > instantiate a class from a dynamically loaded jar. >>> > > >>> > > I reproduced it locally outside of Spark with: >>> > > --- >>> > > URLClassLoader urlClassLoader = new URLClassLoader(new URL[] { new >>> > > File("myotherjar.jar").toURI().toURL() }, null); >>> > > Thread.currentThread().setContextClassLoader(urlClassLoader); >>> > > MyClassFromMyOtherJar obj = new MyClassFromMyOtherJar(); >>> > > --- >>> > > >>> > > I was able to load the class with reflection. >>> > >>>
