[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

Dmitry Goldenberg (JIRA) Mon, 27 Apr 2015 19:58:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516241#comment-14516241
 ]


Dmitry Goldenberg commented on SPARK-7154:
------------------------------------------

I've added this info to the 'mirror' ticket in Phoenix' JIRA I filed 
[PHOENIX-1926|https://issues.apache.org/jira/browse/PHOENIX-1926]:

It looks like I got this to work, with the following in my spark-submit 
invocation:
{code}
./bin/spark-submit \
        --driver-class-path $HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar \
        --driver-java-options 
"-Dspark.driver.extraClassPath=$HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
 \
        --class "com.myco.Driver" \
        --master local[*]  \
        /mnt/data/myco.jar \
{code}
The crucial part to getting this to work was the first parameter, 
--driver-class-path. Things work with just that and without 
spark.driver.extraClassPath. Things do not work with 
spark.driver.extraClassPath but no --driver-class-path, and of course they 
don't work with both of these missing in the invocation.
I also have Phoenix' dependency hbase and hadoop jars' classes rolled into my 
driver jar. Tested with them in the job jar and without, the error goes away in 
either case if the protocol jar is on the driver-class-path.

Since I've only tried this in the local mode, it's not yet clear to me whether 
spark.driver.extraClassPath and/or rolling the Phoenix' dependency hbase and 
hadoop jars into the job jar would be required. But the good news is that 
there's clearly a 'mojo' for getting this to work so one gets no 
IllegalAccessError anymore.

> Spark distro appears to be pulling in incorrect protobuf classes
> ----------------------------------------------------------------
>
>                 Key: SPARK-7154
>                 URL: https://issues.apache.org/jira/browse/SPARK-7154
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.3.0
>            Reporter: Dmitry Goldenberg
>         Attachments: in-google-protobuf-2.5.0.zip, 
> in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt
>
>
> If you download Spark via the site: 
> https://spark.apache.org/downloads.html,
> for example I chose:
> http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
> then you may see incompatibility with other libraries due to incorrect 
> protobuf classes.
> I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
> Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
> dependency. However, at runtime Spark's classes take precedence in class 
> loading and that is causing exceptions such as the following:
> java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
> com/google/protobuf/HBaseZeroCopyByteString
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at 
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
>         at 
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
>         at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
>         at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
>         at 
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
>         at 
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
>         at 
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
>         at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
>         at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:237)
>         at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:231)
>         at 
> org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
>         at 
> org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
>         at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
>         at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
>         at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
>         at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>         at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
>         at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
>         at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174)
>         at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179)
>         at 
> com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139)
>         at 
> com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144)
>         at 
> com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62)
>         at 
> com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305)
>         at 
> com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208)
>         at 
> com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79)
>         at 
> com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83)
>         at 
> com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25)
>         at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198)
>         at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)
>         at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
>         at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalAccessError: 
> com/google/protobuf/HBaseZeroCopyByteString
>         at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1265)
>         at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1258)
>         at org.apache.hadoop.hbase.client.HTable$17.call(HTable.java:1608)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> If you look at the protobuf classes inside the Spark assembly jar, they do 
> not match (use the cmp command) the classes in the stock protobuf 2.5.0 jar:
> BoundedByteString$1.class
> BoundedByteString$BoundedByteIterator.class
> BoundedByteString.class
> ByteString$1.class
> ByteString$ByteIterator.class
> ByteString$CodedBuilder.class
> ByteString$Output.class
> ByteString.class
> CodedInputStream.class
> CodedOutputStream$OutOfSpaceException.class
> CodedOutputStream.class
> LiteralByteString$1.class
> LiteralByteString$LiteralByteIterator.class
> LiteralByteString.class
> All of these are dependency classes for HBaseZeroCopyByteString and they're 
> incompatible which explains the java.lang.IllegalAccessError.
> What's not yet clear to me is how they can be wrong if the Spark pom 
> specifies 2.5.0:
>     <profile>
>       <id>hadoop-2.4</id>
>       <properties>
>         <hadoop.version>2.4.0</hadoop.version>
>         <protobuf.version>2.5.0</protobuf.version>
>         <jets3t.version>0.9.3</jets3t.version>
>         <hbase.version>0.98.7-hadoop2</hbase.version>
>         <commons.math3.version>3.1.1</commons.math3.version>
>         <avro.mapred.classifier>hadoop2</avro.mapred.classifier>
>         <codehaus.jackson.version>1.9.13</codehaus.jackson.version>
>       </properties>
>     </profile>
> This looks correct and in theory should override the 
> <protobuf.version>2.4.1</protobuf.version> specified higher up in the parent 
> pom (https://github.com/apache/spark/blob/master/pom.xml).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

Reply via email to