Release of Spark: 1.5.0. Command line invokation:
ACME_INGEST_HOME=/mnt/acme/acme-ingest ACME_INGEST_VERSION=0.0.1-SNAPSHOT ACME_BATCH_DURATION_MILLIS=5000 SPARK_MASTER_URL=spark://data1:7077 JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000" JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g" $SPARK_HOME/bin/spark-submit \ --driver-class-path $ACME_INGEST_HOME \ --driver-java-options "$JAVA_OPTIONS" \ --class "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \ --master $SPARK_MASTER_URL \ --conf "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar" \ $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \ -brokerlist $METADATA_BROKER_LIST \ -topic acme.topic1 \ -autooffsetreset largest \ -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \ -appname Acme.App1 \ -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1 Note that SolrException is definitely in our consumer jar acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to $ACME_INGEST_HOME. For the extraClassPath on the executors, we've got additionally hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the Spark jobs to communicate with HBase. The only way to force Phoenix to successfully communicate with HBase was to have that JAR explicitly added to the executor classpath regardless of the fact that the contents of the hbase-protocol hadoop jar get rolled up into the consumer jar at build time. I'm starting to wonder whether there's some class loading pattern here where some classes may not get loaded out of the consumer jar and therefore have to have their respective jars added to the executor extraClassPath? Or is this a serialization problem for SolrException as Divya Ravichandran suggested? On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Mind providing a bit more information: > > release of Spark > command line for running Spark job > > Cheers > > On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg < > dgoldenberg...@gmail.com> wrote: > >> We're seeing this occasionally. Granted, this was caused by a wrinkle in >> the Solr schema but this bubbled up all the way in Spark and caused job >> failures. >> >> I just checked and SolrException class is actually in the consumer job >> jar we use. Is there any reason why Spark cannot find the SolrException >> class? >> >> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception >> could not be deserialized >> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:348) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >> at >> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:497) >> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> > >