Hi all,

I've been struggling with a particularly puzzling issue after upgrading to
Spark 1.5.1 from Spark 1.4.1.

When I use the MySQL JDBC connector and an exception (e.g.
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException) is thrown on the
executor, I get a ClassNotFoundException on the driver, which results in
this error (logs are abbreviated):

15/10/16 17:20:59 INFO SparkContext: Starting job: collect at repro.scala:73
...
15/10/16 17:20:59 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/10/16 17:20:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$3.apply$mcZI$sp(repro.scala:69)
...
15/10/16 17:20:59 WARN ThrowableSerializationWrapper: Task exception could
not be deserialized
java.lang.ClassNotFoundException:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
...
15/10/16 17:20:59 ERROR TaskResultGetter: Could not deserialize
TaskEndReason: ClassNotFound with classloader
org.apache.spark.util.MutableURLClassLoader@7f08a6b1
15/10/16 17:20:59 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3,
localhost): UnknownReason
15/10/16 17:20:59 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times;
aborting job
15/10/16 17:20:59 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool
15/10/16 17:20:59 INFO TaskSchedulerImpl: Cancelling stage 3
15/10/16 17:20:59 INFO DAGScheduler: ResultStage 3 (collect at
repro.scala:73) failed in 0.012 s
15/10/16 17:20:59 INFO DAGScheduler: Job 3 failed: collect at
repro.scala:73, took 0.018694 s

 In Spark 1.4.1, I get the following (logs are abbreviated):
15/10/16 17:42:41 INFO SparkContext: Starting job: collect at repro.scala:53
...
15/10/16 17:42:41 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
15/10/16 17:42:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
...
15/10/16 17:42:41 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2,
localhost): com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
...

15/10/16 17:42:41 ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times;
aborting job
15/10/16 17:42:41 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
15/10/16 17:42:41 INFO TaskSchedulerImpl: Cancelling stage 2
15/10/16 17:42:41 INFO DAGScheduler: ResultStage 2 (collect at
repro.scala:53) failed in 0.016 s
15/10/16 17:42:41 INFO DAGScheduler: Job 2 failed: collect at
repro.scala:53, took 0.024584 s


I have seriously screwed up somewhere or this is a change in behavior that
I have not been able to find in the documentation. For those that are
interested, a full repro and logs follow.

Hurshal

---

I am running this on Spark 1.5.1+Hadoop 2.6. I have tried this in various
combinations of
 * local/standalone mode
 * putting mysql on the classpath with --jars/building a fat jar with mysql
in it/manually running sc.addJar on the mysql jar
 * --deploy-mode client/--deploy-mode cluster
but nothing seems to change.



Here is an example invocation, and the accompanying source code:

$ ./bin/spark-submit --master local --deploy-mode client --class
repro.Repro /home/nix/repro/target/scala-2.10/repro-assembly-0.0.1.jar
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/10/16 17:40:53 INFO SparkContext: Running Spark version 1.5.1
15/10/16 17:40:53 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/10/16 17:40:53 WARN Utils: Your hostname, choochootrain resolves to a
loopback address: 127.0.1.1; using 10.0.1.97 instead (on interface wlan0)
15/10/16 17:40:53 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
15/10/16 17:40:53 INFO SecurityManager: Changing view acls to: root
15/10/16 17:40:53 INFO SecurityManager: Changing modify acls to: root
15/10/16 17:40:53 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(root); users
with modify permissions: Set(root)
15/10/16 17:40:54 INFO Slf4jLogger: Slf4jLogger started
15/10/16 17:40:54 INFO Remoting: Starting remoting
15/10/16 17:40:54 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@10.0.1.97:48116]
15/10/16 17:40:54 INFO Utils: Successfully started service 'sparkDriver' on
port 48116.
15/10/16 17:40:54 INFO SparkEnv: Registering MapOutputTracker
15/10/16 17:40:54 INFO SparkEnv: Registering BlockManagerMaster
15/10/16 17:40:54 INFO DiskBlockManager: Created local directory at
/tmp/blockmgr-7e7cf2b0-397e-4c44-97e9-508f5c6ec5ab
15/10/16 17:40:54 INFO MemoryStore: MemoryStore started with capacity 530.3
MB
15/10/16 17:40:54 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-ae2e1831-eb79-4431-bb49-81afbce963d7/httpd-51f4d72e-c080-4d58-8500-8a9174564478
15/10/16 17:40:54 INFO HttpServer: Starting HTTP Server
15/10/16 17:40:54 INFO Utils: Successfully started service 'HTTP file
server' on port 57598.
15/10/16 17:40:54 INFO SparkEnv: Registering OutputCommitCoordinator
15/10/16 17:40:54 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
15/10/16 17:40:54 INFO SparkUI: Started SparkUI at http://10.0.1.97:4040
15/10/16 17:40:54 WARN MetricsSystem: Using default name DAGScheduler for
source because spark.app.id is not set.
15/10/16 17:40:54 INFO Executor: Starting executor ID driver on host
localhost
15/10/16 17:40:54 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 53260.
15/10/16 17:40:54 INFO NettyBlockTransferService: Server created on 53260
15/10/16 17:40:54 INFO BlockManagerMaster: Trying to register BlockManager
15/10/16 17:40:54 INFO BlockManagerMasterEndpoint: Registering block
manager localhost:53260 with 530.3 MB RAM, BlockManagerId(driver,
localhost, 53260)
15/10/16 17:40:54 INFO BlockManagerMaster: Registered BlockManager




##### Does the executor have MySQLSyntaxErrorException?
15/10/16 17:40:55 INFO SparkContext: Starting job: collect at repro.scala:24
15/10/16 17:40:55 INFO DAGScheduler: Got job 0 (collect at repro.scala:24)
with 1 output partitions
15/10/16 17:40:55 INFO DAGScheduler: Final stage: ResultStage 0(collect at
repro.scala:24)
15/10/16 17:40:55 INFO DAGScheduler: Parents of final stage: List()
15/10/16 17:40:55 INFO DAGScheduler: Missing parents: List()
15/10/16 17:40:55 INFO DAGScheduler: Submitting ResultStage 0
(MapPartitionsRDD[1] at map at repro.scala:16), which has no missing parents
15/10/16 17:40:55 INFO MemoryStore: ensureFreeSpace(1976) called with
curMem=0, maxMem=556038881
15/10/16 17:40:55 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1976.0 B, free 530.3 MB)
15/10/16 17:40:55 INFO MemoryStore: ensureFreeSpace(1243) called with
curMem=1976, maxMem=556038881
15/10/16 17:40:55 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1243.0 B, free 530.3 MB)
15/10/16 17:40:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on localhost:53260 (size: 1243.0 B, free: 530.3 MB)
15/10/16 17:40:55 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:861
15/10/16 17:40:55 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 0 (MapPartitionsRDD[1] at map at repro.scala:16)
15/10/16 17:40:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/10/16 17:40:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, localhost, PROCESS_LOCAL, 2160 bytes)
15/10/16 17:40:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/10/16 17:40:55 INFO Executor: Fetching
http://10.0.1.97:57598/jars/mysql-connector-java-5.1.34.jar with timestamp
1445042454812
15/10/16 17:40:55 INFO Utils: Fetching
http://10.0.1.97:57598/jars/mysql-connector-java-5.1.34.jar to
/tmp/spark-ae2e1831-eb79-4431-bb49-81afbce963d7/userFiles-af3a471b-c102-4d5a-87b4-8f538bef09ec/fetchFileTemp2850274256530295645.tmp
15/10/16 17:40:55 INFO Executor: Adding
file:/tmp/spark-ae2e1831-eb79-4431-bb49-81afbce963d7/userFiles-af3a471b-c102-4d5a-87b4-8f538bef09ec/mysql-connector-java-5.1.34.jar
to class loader
15/10/16 17:40:55 INFO Executor: Fetching
http://10.0.1.97:57598/jars/repro-assembly-0.0.1.jar with timestamp
1445042454573
15/10/16 17:40:55 INFO Utils: Fetching
http://10.0.1.97:57598/jars/repro-assembly-0.0.1.jar to
/tmp/spark-ae2e1831-eb79-4431-bb49-81afbce963d7/userFiles-af3a471b-c102-4d5a-87b4-8f538bef09ec/fetchFileTemp4393235523023490143.tmp
15/10/16 17:40:55 INFO Executor: Adding
file:/tmp/spark-ae2e1831-eb79-4431-bb49-81afbce963d7/userFiles-af3a471b-c102-4d5a-87b4-8f538bef09ec/repro-assembly-0.0.1.jar
to class loader
15/10/16 17:40:55 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
899 bytes result sent to driver
15/10/16 17:40:55 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
0) in 108 ms on localhost (1/1)
15/10/16 17:40:55 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
15/10/16 17:40:55 INFO DAGScheduler: ResultStage 0 (collect at
repro.scala:24) finished in 0.116 s
15/10/16 17:40:55 INFO DAGScheduler: Job 0 finished: collect at
repro.scala:24, took 0.228888 s
##### Yes.




##### Can the executor catch MySQLSyntaxErrorException?
15/10/16 17:40:55 INFO SparkContext: Starting job: collect at repro.scala:40
15/10/16 17:40:55 INFO DAGScheduler: Got job 1 (collect at repro.scala:40)
with 1 output partitions
15/10/16 17:40:55 INFO DAGScheduler: Final stage: ResultStage 1(collect at
repro.scala:40)
15/10/16 17:40:55 INFO DAGScheduler: Parents of final stage: List()
15/10/16 17:40:55 INFO DAGScheduler: Missing parents: List()
15/10/16 17:40:55 INFO DAGScheduler: Submitting ResultStage 1
(MapPartitionsRDD[3] at map at repro.scala:32), which has no missing parents
15/10/16 17:40:55 INFO MemoryStore: ensureFreeSpace(1984) called with
curMem=3219, maxMem=556038881
15/10/16 17:40:55 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 1984.0 B, free 530.3 MB)
15/10/16 17:40:55 INFO MemoryStore: ensureFreeSpace(1242) called with
curMem=5203, maxMem=556038881
15/10/16 17:40:55 INFO MemoryStore: Block broadcast_1_piece0 stored as
bytes in memory (estimated size 1242.0 B, free 530.3 MB)
15/10/16 17:40:55 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on localhost:53260 (size: 1242.0 B, free: 530.3 MB)
15/10/16 17:40:55 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:861
15/10/16 17:40:55 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 1 (MapPartitionsRDD[3] at map at repro.scala:32)
15/10/16 17:40:55 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/10/16 17:40:55 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1, localhost, PROCESS_LOCAL, 2160 bytes)
15/10/16 17:40:55 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/10/16 17:40:55 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1).
899 bytes result sent to driver
15/10/16 17:40:55 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
1) in 32 ms on localhost (1/1)
15/10/16 17:40:55 INFO DAGScheduler: ResultStage 1 (collect at
repro.scala:40) finished in 0.032 s
15/10/16 17:40:55 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
15/10/16 17:40:55 INFO DAGScheduler: Job 1 finished: collect at
repro.scala:40, took 0.039600 s
##### Yes.




##### Can the driver recover from MySQLSyntaxErrorException?
15/10/16 17:40:55 INFO SparkContext: Starting job: collect at repro.scala:53
15/10/16 17:40:55 INFO DAGScheduler: Got job 2 (collect at repro.scala:53)
with 1 output partitions
15/10/16 17:40:55 INFO DAGScheduler: Final stage: ResultStage 2(collect at
repro.scala:53)
15/10/16 17:40:55 INFO DAGScheduler: Parents of final stage: List()
15/10/16 17:40:55 INFO DAGScheduler: Missing parents: List()
15/10/16 17:40:55 INFO DAGScheduler: Submitting ResultStage 2
(MapPartitionsRDD[5] at map at repro.scala:48), which has no missing parents
15/10/16 17:40:55 INFO MemoryStore: ensureFreeSpace(1984) called with
curMem=6445, maxMem=556038881
15/10/16 17:40:55 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 1984.0 B, free 530.3 MB)
15/10/16 17:40:55 INFO MemoryStore: ensureFreeSpace(1244) called with
curMem=8429, maxMem=556038881
15/10/16 17:40:55 INFO MemoryStore: Block broadcast_2_piece0 stored as
bytes in memory (estimated size 1244.0 B, free 530.3 MB)
15/10/16 17:40:55 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on localhost:53260 (size: 1244.0 B, free: 530.3 MB)
15/10/16 17:40:55 INFO SparkContext: Created broadcast 2 from broadcast at
DAGScheduler.scala:861
15/10/16 17:40:55 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 2 (MapPartitionsRDD[5] at map at repro.scala:48)
15/10/16 17:40:55 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/10/16 17:40:55 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID
2, localhost, PROCESS_LOCAL, 2160 bytes)
15/10/16 17:40:55 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
15/10/16 17:40:55 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
at repro.Repro$$anonfun$main$2.apply(repro.scala:48)
at repro.Repro$$anonfun$main$2.apply(repro.scala:48)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to
<http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/16 17:40:55 WARN ThrowableSerializationWrapper: Task exception could
not be deserialized
java.lang.ClassNotFoundException:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1897)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/16 17:40:55 ERROR TaskResultGetter: Could not deserialize
TaskEndReason: ClassNotFound with classloader
org.apache.spark.util.MutableURLClassLoader@7f08a6b1
15/10/16 17:40:55 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2,
localhost): UnknownReason
15/10/16 17:40:55 ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times;
aborting job
15/10/16 17:40:55 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
15/10/16 17:40:55 INFO TaskSchedulerImpl: Cancelling stage 2
15/10/16 17:40:55 INFO DAGScheduler: ResultStage 2 (collect at
repro.scala:53) failed in 0.022 s
15/10/16 17:40:55 INFO DAGScheduler: Job 2 failed: collect at
repro.scala:53, took 0.030159 s
##### NO.




15/10/16 17:40:55 INFO SparkContext: Invoking stop() from shutdown hook
15/10/16 17:40:55 INFO SparkUI: Stopped Spark web UI at
http://10.0.1.97:4040
15/10/16 17:40:55 INFO DAGScheduler: Stopping DAGScheduler
15/10/16 17:40:55 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
15/10/16 17:40:55 INFO MemoryStore: MemoryStore cleared
15/10/16 17:40:55 INFO BlockManager: BlockManager stopped
15/10/16 17:40:55 INFO BlockManagerMaster: BlockManagerMaster stopped
15/10/16 17:40:55 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
15/10/16 17:40:55 INFO SparkContext: Successfully stopped SparkContext
15/10/16 17:40:55 INFO ShutdownHookManager: Shutdown hook called
15/10/16 17:40:55 INFO ShutdownHookManager: Deleting directory
/tmp/spark-ae2e1831-eb79-4431-bb49-81afbce963d7



The source code:
package repro

import java.sql.DriverManager
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkException

object Repro {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("MySQL ClassNotFound Repro")
    val sc = new SparkContext(conf)

    println("\n\n\n")
    println("##### Does the executor have MySQLSyntaxErrorException?")
    var rdd = sc.parallelize(List(1)).map { x =>
      try {

Class.forName("com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException")
        true
      } catch {
        case e: ClassNotFoundException => false
      }
    }
    if (rdd.collect().toList == List(true)) {
      println("##### Yes.")
    } else {
      println("##### NO.")
    }

    println("\n\n\n")
    println("##### Can the executor catch MySQLSyntaxErrorException?")
    rdd = sc.parallelize(List(1)).map { x =>
      try {
        throw new
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException()
      } catch {
        case e: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
=> false
      }
      true
    }
    if (rdd.collect().toList == List(true)) {
      println("##### Yes.")
    } else {
      println("##### NO.")
    }

    println("\n\n\n")
    println("##### Can the driver recover from MySQLSyntaxErrorException?")
    rdd = sc.parallelize(List(1)).map { x =>
      throw new com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException()
      true
    }
    try {
      rdd.collect()
    } catch {
      case e: SparkException => {
        if (e.getMessage.contains("UnknownReason")) {
          println("##### NO.")
        } else {
          println("##### Yes.")
        }
      }
    }
    println("\n\n\n")

  }
}


And the build.sbt:
lazy val root = (project in file(".")).
  settings(
    organization := "com.repro",
    version := "0.0.1",
    scalaVersion := "2.10.5",
    name := "repro",
    libraryDependencies ++= Seq(
        "org.apache.spark" %% "spark-core" % "1.5.1" % "provided",
        "org.apache.spark" %% "spark-sql" % "1.5.1"  % "provided",
        "org.apache.spark" %% "spark-streaming" % "1.5.1" % "provided",
        "mysql" % "mysql-connector-java" % "5.1.34"
    )
)




And here are the results when run against Spark 1.4.1 (build.sbt has been
updated accordingly)

$ ./bin/spark-submit --master local --deploy-mode client --class
repro.Repro /home/nix/repro/target/scala-2.10/repro-assembly-0.0.1.jar
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/10/16 17:42:40 INFO SparkContext: Running Spark version 1.4.1
15/10/16 17:42:40 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/10/16 17:42:40 WARN Utils: Your hostname, choochootrain resolves to a
loopback address: 127.0.1.1; using 10.0.1.97 instead (on interface wlan0)
15/10/16 17:42:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
15/10/16 17:42:40 INFO SecurityManager: Changing view acls to: root
15/10/16 17:42:40 INFO SecurityManager: Changing modify acls to: root
15/10/16 17:42:40 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(root); users
with modify permissions: Set(root)
15/10/16 17:42:40 INFO Slf4jLogger: Slf4jLogger started
15/10/16 17:42:40 INFO Remoting: Starting remoting
15/10/16 17:42:40 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@10.0.1.97:48333]
15/10/16 17:42:41 INFO Utils: Successfully started service 'sparkDriver' on
port 48333.
15/10/16 17:42:41 INFO SparkEnv: Registering MapOutputTracker
15/10/16 17:42:41 INFO SparkEnv: Registering BlockManagerMaster
15/10/16 17:42:41 INFO DiskBlockManager: Created local directory at
/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a/blockmgr-9f9e70fc-62f7-4308-8b7d-c1b4127241f3
15/10/16 17:42:41 INFO MemoryStore: MemoryStore started with capacity 265.4
MB
15/10/16 17:42:41 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a/httpd-fa89fece-1832-4a31-bdb5-8c9b7761b4e6
15/10/16 17:42:41 INFO HttpServer: Starting HTTP Server
15/10/16 17:42:41 INFO Utils: Successfully started service 'HTTP file
server' on port 49551.
15/10/16 17:42:41 INFO SparkEnv: Registering OutputCommitCoordinator
15/10/16 17:42:41 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
15/10/16 17:42:41 INFO SparkUI: Started SparkUI at http://10.0.1.97:4040
15/10/16 17:42:41 INFO SparkContext: Added JAR
file:/home/nix/repro/target/scala-2.10/repro-assembly-0.0.1.jar at
http://10.0.1.97:49551/jars/repro-assembly-0.0.1.jar with timestamp
1445042561260
15/10/16 17:42:41 INFO Executor: Starting executor ID driver on host
localhost
15/10/16 17:42:41 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 42805.
15/10/16 17:42:41 INFO NettyBlockTransferService: Server created on 42805
15/10/16 17:42:41 INFO BlockManagerMaster: Trying to register BlockManager
15/10/16 17:42:41 INFO BlockManagerMasterEndpoint: Registering block
manager localhost:42805 with 265.4 MB RAM, BlockManagerId(driver,
localhost, 42805)
15/10/16 17:42:41 INFO BlockManagerMaster: Registered BlockManager




##### Does the executor have MySQLSyntaxErrorException?
15/10/16 17:42:41 INFO SparkContext: Starting job: collect at repro.scala:24
15/10/16 17:42:41 INFO DAGScheduler: Got job 0 (collect at repro.scala:24)
with 1 output partitions (allowLocal=false)
15/10/16 17:42:41 INFO DAGScheduler: Final stage: ResultStage 0(collect at
repro.scala:24)
15/10/16 17:42:41 INFO DAGScheduler: Parents of final stage: List()
15/10/16 17:42:41 INFO DAGScheduler: Missing parents: List()
15/10/16 17:42:41 INFO DAGScheduler: Submitting ResultStage 0
(MapPartitionsRDD[1] at map at repro.scala:16), which has no missing parents
15/10/16 17:42:41 INFO MemoryStore: ensureFreeSpace(1976) called with
curMem=0, maxMem=278302556
15/10/16 17:42:41 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1976.0 B, free 265.4 MB)
15/10/16 17:42:41 INFO MemoryStore: ensureFreeSpace(1243) called with
curMem=1976, maxMem=278302556
15/10/16 17:42:41 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1243.0 B, free 265.4 MB)
15/10/16 17:42:41 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on localhost:42805 (size: 1243.0 B, free: 265.4 MB)
15/10/16 17:42:41 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:874
15/10/16 17:42:41 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 0 (MapPartitionsRDD[1] at map at repro.scala:16)
15/10/16 17:42:41 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/10/16 17:42:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, localhost, PROCESS_LOCAL, 1444 bytes)
15/10/16 17:42:41 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/10/16 17:42:41 INFO Executor: Fetching
http://10.0.1.97:49551/jars/repro-assembly-0.0.1.jar with timestamp
1445042561260
15/10/16 17:42:41 INFO Utils: Fetching
http://10.0.1.97:49551/jars/repro-assembly-0.0.1.jar to
/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a/userFiles-9d259a88-ed7a-40c3-a4bc-8e8fa489e8c9/fetchFileTemp4944656807834781951.tmp
15/10/16 17:42:41 INFO Executor: Adding
file:/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a/userFiles-9d259a88-ed7a-40c3-a4bc-8e8fa489e8c9/repro-assembly-0.0.1.jar
to class loader
15/10/16 17:42:41 INFO Executor: Fetching
http://10.0.1.97:49551/jars/mysql-connector-java-5.1.34.jar with timestamp
1445042561466
15/10/16 17:42:41 INFO Utils: Fetching
http://10.0.1.97:49551/jars/mysql-connector-java-5.1.34.jar to
/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a/userFiles-9d259a88-ed7a-40c3-a4bc-8e8fa489e8c9/fetchFileTemp7552804998517503805.tmp
15/10/16 17:42:41 INFO Executor: Adding
file:/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a/userFiles-9d259a88-ed7a-40c3-a4bc-8e8fa489e8c9/mysql-connector-java-5.1.34.jar
to class loader
15/10/16 17:42:41 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
604 bytes result sent to driver
15/10/16 17:42:41 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
0) in 116 ms on localhost (1/1)
15/10/16 17:42:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
15/10/16 17:42:41 INFO DAGScheduler: ResultStage 0 (collect at
repro.scala:24) finished in 0.123 s
15/10/16 17:42:41 INFO DAGScheduler: Job 0 finished: collect at
repro.scala:24, took 0.231604 s
##### Yes.




##### Can the executor catch MySQLSyntaxErrorException?
15/10/16 17:42:41 INFO SparkContext: Starting job: collect at repro.scala:40
15/10/16 17:42:41 INFO DAGScheduler: Got job 1 (collect at repro.scala:40)
with 1 output partitions (allowLocal=false)
15/10/16 17:42:41 INFO DAGScheduler: Final stage: ResultStage 1(collect at
repro.scala:40)
15/10/16 17:42:41 INFO DAGScheduler: Parents of final stage: List()
15/10/16 17:42:41 INFO DAGScheduler: Missing parents: List()
15/10/16 17:42:41 INFO DAGScheduler: Submitting ResultStage 1
(MapPartitionsRDD[3] at map at repro.scala:32), which has no missing parents
15/10/16 17:42:41 INFO MemoryStore: ensureFreeSpace(1984) called with
curMem=3219, maxMem=278302556
15/10/16 17:42:41 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 1984.0 B, free 265.4 MB)
15/10/16 17:42:41 INFO MemoryStore: ensureFreeSpace(1242) called with
curMem=5203, maxMem=278302556
15/10/16 17:42:41 INFO MemoryStore: Block broadcast_1_piece0 stored as
bytes in memory (estimated size 1242.0 B, free 265.4 MB)
15/10/16 17:42:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on localhost:42805 (size: 1242.0 B, free: 265.4 MB)
15/10/16 17:42:41 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:874
15/10/16 17:42:41 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 1 (MapPartitionsRDD[3] at map at repro.scala:32)
15/10/16 17:42:41 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/10/16 17:42:41 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1, localhost, PROCESS_LOCAL, 1444 bytes)
15/10/16 17:42:41 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/10/16 17:42:41 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1).
604 bytes result sent to driver
15/10/16 17:42:41 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
1) in 20 ms on localhost (1/1)
15/10/16 17:42:41 INFO DAGScheduler: ResultStage 1 (collect at
repro.scala:40) finished in 0.020 s
15/10/16 17:42:41 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
15/10/16 17:42:41 INFO DAGScheduler: Job 1 finished: collect at
repro.scala:40, took 0.029509 s
##### Yes.




##### Can the driver recover from MySQLSyntaxErrorException?
15/10/16 17:42:41 INFO SparkContext: Starting job: collect at repro.scala:53
15/10/16 17:42:41 INFO DAGScheduler: Got job 2 (collect at repro.scala:53)
with 1 output partitions (allowLocal=false)
15/10/16 17:42:41 INFO DAGScheduler: Final stage: ResultStage 2(collect at
repro.scala:53)
15/10/16 17:42:41 INFO DAGScheduler: Parents of final stage: List()
15/10/16 17:42:41 INFO DAGScheduler: Missing parents: List()
15/10/16 17:42:41 INFO DAGScheduler: Submitting ResultStage 2
(MapPartitionsRDD[5] at map at repro.scala:48), which has no missing parents
15/10/16 17:42:41 INFO MemoryStore: ensureFreeSpace(1984) called with
curMem=6445, maxMem=278302556
15/10/16 17:42:41 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 1984.0 B, free 265.4 MB)
15/10/16 17:42:41 INFO MemoryStore: ensureFreeSpace(1244) called with
curMem=8429, maxMem=278302556
15/10/16 17:42:41 INFO MemoryStore: Block broadcast_2_piece0 stored as
bytes in memory (estimated size 1244.0 B, free 265.4 MB)
15/10/16 17:42:41 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on localhost:42805 (size: 1244.0 B, free: 265.4 MB)
15/10/16 17:42:41 INFO SparkContext: Created broadcast 2 from broadcast at
DAGScheduler.scala:874
15/10/16 17:42:41 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 2 (MapPartitionsRDD[5] at map at repro.scala:48)
15/10/16 17:42:41 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/10/16 17:42:41 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID
2, localhost, PROCESS_LOCAL, 1444 bytes)
15/10/16 17:42:41 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
15/10/16 17:42:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
at repro.Repro$$anonfun$main$2.apply(repro.scala:48)
at repro.Repro$$anonfun$main$2.apply(repro.scala:48)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to
<http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:885)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:885)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/16 17:42:41 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2,
localhost): com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
at repro.Repro$$anonfun$main$2.apply(repro.scala:48)
at repro.Repro$$anonfun$main$2.apply(repro.scala:48)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to
<http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:885)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:885)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

15/10/16 17:42:41 ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times;
aborting job
15/10/16 17:42:41 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
15/10/16 17:42:41 INFO TaskSchedulerImpl: Cancelling stage 2
15/10/16 17:42:41 INFO DAGScheduler: ResultStage 2 (collect at
repro.scala:53) failed in 0.016 s
15/10/16 17:42:41 INFO DAGScheduler: Job 2 failed: collect at
repro.scala:53, took 0.024584 s
##### Yes.




15/10/16 17:42:41 INFO SparkContext: Invoking stop() from shutdown hook
15/10/16 17:42:42 INFO SparkUI: Stopped Spark web UI at
http://10.0.1.97:4040
15/10/16 17:42:42 INFO DAGScheduler: Stopping DAGScheduler
15/10/16 17:42:42 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
15/10/16 17:42:42 INFO Utils: path =
/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a/blockmgr-9f9e70fc-62f7-4308-8b7d-c1b4127241f3,
already present as root for deletion.
15/10/16 17:42:42 INFO MemoryStore: MemoryStore cleared
15/10/16 17:42:42 INFO BlockManager: BlockManager stopped
15/10/16 17:42:42 INFO BlockManagerMaster: BlockManagerMaster stopped
15/10/16 17:42:42 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
15/10/16 17:42:42 INFO SparkContext: Successfully stopped SparkContext
15/10/16 17:42:42 INFO Utils: Shutdown hook called
15/10/16 17:42:42 INFO Utils: Deleting directory
/tmp/spark-e2946d3d-6630-456f-b6a7-bc1583c1362a

Reply via email to