I believe I've had trouble with --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true before, so these might not work...
I was thinking of trying to add the solr4j jar to spark.executor.extraClassPath... On Wed, Sep 30, 2015 at 12:01 PM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. have tried these settings with the hbase protocol jar, to no avail > > In that case, HBaseZeroCopyByteString is contained in hbase-protocol.jar. > In HBaseZeroCopyByteString , you can see: > > package com.google.protobuf; // This is a lie. > > If protobuf jar is loaded ahead of hbase-protocol.jar, things start to get > interesting ... > > On Tue, Sep 29, 2015 at 6:12 PM, Dmitry Goldenberg < > dgoldenberg...@gmail.com> wrote: > >> Ted, I think I have tried these settings with the hbase protocol jar, to >> no avail. >> >> I'm going to see if I can try and use these with this SolrException issue >> though it now may be harder to reproduce it. Thanks for the suggestion. >> >> On Tue, Sep 29, 2015 at 8:03 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Have you tried the following ? >>> --conf spark.driver.userClassPathFirst=true --conf spark.executor. >>> userClassPathFirst=true >>> >>> On Tue, Sep 29, 2015 at 4:38 PM, Dmitry Goldenberg < >>> dgoldenberg...@gmail.com> wrote: >>> >>>> Release of Spark: 1.5.0. >>>> >>>> Command line invokation: >>>> >>>> ACME_INGEST_HOME=/mnt/acme/acme-ingest >>>> ACME_INGEST_VERSION=0.0.1-SNAPSHOT >>>> ACME_BATCH_DURATION_MILLIS=5000 >>>> SPARK_MASTER_URL=spark://data1:7077 >>>> JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000" >>>> JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g" >>>> >>>> $SPARK_HOME/bin/spark-submit \ >>>> --driver-class-path $ACME_INGEST_HOME \ >>>> --driver-java-options "$JAVA_OPTIONS" \ >>>> --class >>>> "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \ >>>> --master $SPARK_MASTER_URL \ >>>> --conf >>>> "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar" >>>> \ >>>> >>>> $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \ >>>> -brokerlist $METADATA_BROKER_LIST \ >>>> -topic acme.topic1 \ >>>> -autooffsetreset largest \ >>>> -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \ >>>> -appname Acme.App1 \ >>>> -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1 >>>> Note that SolrException is definitely in our consumer jar >>>> acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to >>>> $ACME_INGEST_HOME. >>>> >>>> For the extraClassPath on the executors, we've got additionally >>>> hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the >>>> Spark jobs to communicate with HBase. The only way to force Phoenix to >>>> successfully communicate with HBase was to have that JAR explicitly added >>>> to the executor classpath regardless of the fact that the contents of the >>>> hbase-protocol hadoop jar get rolled up into the consumer jar at build >>>> time. >>>> >>>> I'm starting to wonder whether there's some class loading pattern here >>>> where some classes may not get loaded out of the consumer jar and therefore >>>> have to have their respective jars added to the executor extraClassPath? >>>> >>>> Or is this a serialization problem for SolrException as Divya >>>> Ravichandran suggested? >>>> >>>> >>>> >>>> >>>> On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Mind providing a bit more information: >>>>> >>>>> release of Spark >>>>> command line for running Spark job >>>>> >>>>> Cheers >>>>> >>>>> On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg < >>>>> dgoldenberg...@gmail.com> wrote: >>>>> >>>>>> We're seeing this occasionally. Granted, this was caused by a wrinkle >>>>>> in the Solr schema but this bubbled up all the way in Spark and caused >>>>>> job >>>>>> failures. >>>>>> >>>>>> I just checked and SolrException class is actually in the consumer >>>>>> job jar we use. Is there any reason why Spark cannot find the >>>>>> SolrException class? >>>>>> >>>>>> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception >>>>>> could not be deserialized >>>>>> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException >>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>> at java.lang.Class.forName0(Native Method) >>>>>> at java.lang.Class.forName(Class.java:348) >>>>>> at >>>>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >>>>>> at >>>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) >>>>>> at >>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) >>>>>> at >>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) >>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>>>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >>>>>> at >>>>>> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>> at >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>>>> at >>>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >>>>>> at >>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) >>>>>> at >>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) >>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>>>>> at >>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) >>>>>> at >>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>>>>> at >>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) >>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>>>>> at >>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) >>>>>> at >>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>>>>> at >>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) >>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>>>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >>>>>> at >>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >>>>>> at >>>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >>>>>> at >>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108) >>>>>> at >>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) >>>>>> at >>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) >>>>>> at >>>>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >>>>>> at >>>>>> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>> >>>>> >>>> >>> >> >