Hi everyone, I¹m running into a strange class loading issue when running a Spark job, using Spark 1.0.2.
I¹m running a process where some Java code is compiled dynamically into a jar and added to the Spark context via addJar(). It is also added to the class loader of the thread that created the Spark context. When I try to run any job that only references the dynamically-compiled class on the workers, and then convert them to some other value (say integers) before collecting the result at the driver, the job completes successfully. We¹re using Kryo serialization. When I try to run a similar workflow but request for objects containing fields of the type of the dynamically compiled class at the driver (say by collect()) the job breaks with the following exception: "Caused by: java.lang.RuntimeException: java.security.PrivilegedActionException: org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: Unable to find class: TBEjava1" Where ³TBEjava1² is the name of the dynamically compiled class. Here is what I can deduce from my debugging: 1. The class is loaded on the thread that launches the Spark context and calls reduce(). To check, I put a breakpoint right before my reduce() call and used Class.forName("TBEjava1", false, Thread.currentThread().getContextClassLoader()); and got back a valid class object without ClassNotFoundException being thrown. 2. The worker threads can also refer to the class. I put breakpoints in the worker methods (using local[N] mode for the context for now) and they complete the mapping and reducing functions successfully. 3. The Kryo serializer calls readClass() and then calls Class.forName() inside Kryo, using the class loader in that Kryo object. The class not found exception is thrown there, however the stack trace doesn¹t appear as such. I¹m wondering if we might be running into https://issues.apache.org/jira/browse/SPARK-3046 or something. I looked a bit at the Spark code, and from what I understand, the thread pool created for task result getter does not inherit the context class loader of the thread that created the Spark Context, which would explain why the task result getter threads can¹t find classes even though they are available via addJar(). Any suggestions for a workaround? Feel free to correct me on any incorrect observations I¹ve made as well. Thanks, -Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature