Release of Spark: 1.5.0.

Command line invokation:

ACME_INGEST_HOME=/mnt/acme/acme-ingest
ACME_INGEST_VERSION=0.0.1-SNAPSHOT
ACME_BATCH_DURATION_MILLIS=5000
SPARK_MASTER_URL=spark://data1:7077
JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"

$SPARK_HOME/bin/spark-submit \
        --driver-class-path  $ACME_INGEST_HOME \
        --driver-java-options "$JAVA_OPTIONS" \
        --class "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \
        --master $SPARK_MASTER_URL  \
        --conf
"spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
\

$ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
        -brokerlist $METADATA_BROKER_LIST \
        -topic acme.topic1 \
        -autooffsetreset largest \
        -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
        -appname Acme.App1 \
        -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
Note that SolrException is definitely in our consumer jar
acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
$ACME_INGEST_HOME.

For the extraClassPath on the executors, we've got additionally
hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
Spark jobs to communicate with HBase.  The only way to force Phoenix to
successfully communicate with HBase was to have that JAR explicitly added
to the executor classpath regardless of the fact that the contents of the
hbase-protocol hadoop jar get rolled up into the consumer jar at build time.

I'm starting to wonder whether there's some class loading pattern here
where some classes may not get loaded out of the consumer jar and therefore
have to have their respective jars added to the executor extraClassPath?

Or is this a serialization problem for SolrException as Divya Ravichandran
suggested?




On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Mind providing a bit more information:
>
> release of Spark
> command line for running Spark job
>
> Cheers
>
> On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
> dgoldenberg...@gmail.com> wrote:
>
>> We're seeing this occasionally. Granted, this was caused by a wrinkle in
>> the Solr schema but this bubbled up all the way in Spark and caused job
>> failures.
>>
>> I just checked and SolrException class is actually in the consumer job
>> jar we use.  Is there any reason why Spark cannot find the SolrException
>> class?
>>
>> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
>> could not be deserialized
>> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:348)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>> at
>> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>
>

Reply via email to