I believe I've had trouble with --conf spark.driver.userClassPathFirst=true
--conf spark.executor.userClassPathFirst=true before, so these might not
work...

I was thinking of trying to add the solr4j jar to
spark.executor.extraClassPath...

On Wed, Sep 30, 2015 at 12:01 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> bq. have tried these settings with the hbase protocol jar, to no avail
>
> In that case, HBaseZeroCopyByteString is contained in hbase-protocol.jar.
> In HBaseZeroCopyByteString , you can see:
>
> package com.google.protobuf;  // This is a lie.
>
> If protobuf jar is loaded ahead of hbase-protocol.jar, things start to get
> interesting ...
>
> On Tue, Sep 29, 2015 at 6:12 PM, Dmitry Goldenberg <
> dgoldenberg...@gmail.com> wrote:
>
>> Ted, I think I have tried these settings with the hbase protocol jar, to
>> no avail.
>>
>> I'm going to see if I can try and use these with this SolrException issue
>> though it now may be harder to reproduce it. Thanks for the suggestion.
>>
>> On Tue, Sep 29, 2015 at 8:03 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Have you tried the following ?
>>> --conf spark.driver.userClassPathFirst=true --conf spark.executor.
>>> userClassPathFirst=true
>>>
>>> On Tue, Sep 29, 2015 at 4:38 PM, Dmitry Goldenberg <
>>> dgoldenberg...@gmail.com> wrote:
>>>
>>>> Release of Spark: 1.5.0.
>>>>
>>>> Command line invokation:
>>>>
>>>> ACME_INGEST_HOME=/mnt/acme/acme-ingest
>>>> ACME_INGEST_VERSION=0.0.1-SNAPSHOT
>>>> ACME_BATCH_DURATION_MILLIS=5000
>>>> SPARK_MASTER_URL=spark://data1:7077
>>>> JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
>>>> JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"
>>>>
>>>> $SPARK_HOME/bin/spark-submit \
>>>>         --driver-class-path  $ACME_INGEST_HOME \
>>>>         --driver-java-options "$JAVA_OPTIONS" \
>>>>         --class
>>>> "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \
>>>>         --master $SPARK_MASTER_URL  \
>>>>         --conf
>>>> "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
>>>> \
>>>>
>>>> $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
>>>>         -brokerlist $METADATA_BROKER_LIST \
>>>>         -topic acme.topic1 \
>>>>         -autooffsetreset largest \
>>>>         -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
>>>>         -appname Acme.App1 \
>>>>         -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
>>>> Note that SolrException is definitely in our consumer jar
>>>> acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
>>>> $ACME_INGEST_HOME.
>>>>
>>>> For the extraClassPath on the executors, we've got additionally
>>>> hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
>>>> Spark jobs to communicate with HBase.  The only way to force Phoenix to
>>>> successfully communicate with HBase was to have that JAR explicitly added
>>>> to the executor classpath regardless of the fact that the contents of the
>>>> hbase-protocol hadoop jar get rolled up into the consumer jar at build 
>>>> time.
>>>>
>>>> I'm starting to wonder whether there's some class loading pattern here
>>>> where some classes may not get loaded out of the consumer jar and therefore
>>>> have to have their respective jars added to the executor extraClassPath?
>>>>
>>>> Or is this a serialization problem for SolrException as Divya
>>>> Ravichandran suggested?
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> Mind providing a bit more information:
>>>>>
>>>>> release of Spark
>>>>> command line for running Spark job
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
>>>>> dgoldenberg...@gmail.com> wrote:
>>>>>
>>>>>> We're seeing this occasionally. Granted, this was caused by a wrinkle
>>>>>> in the Solr schema but this bubbled up all the way in Spark and caused 
>>>>>> job
>>>>>> failures.
>>>>>>
>>>>>> I just checked and SolrException class is actually in the consumer
>>>>>> job jar we use.  Is there any reason why Spark cannot find the
>>>>>> SolrException class?
>>>>>>
>>>>>> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
>>>>>> could not be deserialized
>>>>>> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>> at java.lang.Class.forName0(Native Method)
>>>>>> at java.lang.Class.forName(Class.java:348)
>>>>>> at
>>>>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>>>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>>>>>> at
>>>>>> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>> at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>> at java.lang.reflect.Method.invoke(Method.java:497)
>>>>>> at
>>>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>>>>> at
>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>>>>> at
>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>>>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>>>>>> at
>>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>>>>> at
>>>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>>>>> at
>>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
>>>>>> at
>>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
>>>>>> at
>>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
>>>>>> at
>>>>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>>>>>> at
>>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to