[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitry Goldenberg updated SPARK-7154: ------------------------------------- Priority: Critical (was: Major) > Spark distro appears to be pulling in incorrect protobuf classes > ---------------------------------------------------------------- > > Key: SPARK-7154 > URL: https://issues.apache.org/jira/browse/SPARK-7154 > Project: Spark > Issue Type: Bug > Components: Build > Affects Versions: 1.3.0 > Reporter: Dmitry Goldenberg > Priority: Critical > Attachments: in-google-protobuf-2.5.0.zip, > in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt > > > If you download Spark via the site: > https://spark.apache.org/downloads.html, > for example I chose: > http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz > then you may see incompatibility with other libraries due to incorrect > protobuf classes. > I'm seeing such a case in my Spark Streaming job which attempts to use Apache > Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 > dependency. However, at runtime Spark's classes take precedence in class > loading and that is causing exceptions such as the following: > java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: > com/google/protobuf/HBaseZeroCopyByteString > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) > at > org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) > at > org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) > at > org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) > at > org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) > at > org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) > at > org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) > at > org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) > at > org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:237) > at > org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:231) > at > org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) > at > org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179) > at > com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139) > at > com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144) > at > com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62) > at > com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305) > at > com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208) > at > com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79) > at > com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83) > at > com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalAccessError: > com/google/protobuf/HBaseZeroCopyByteString > at > org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1265) > at > org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1258) > at org.apache.hadoop.hbase.client.HTable$17.call(HTable.java:1608) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > If you look at the protobuf classes inside the Spark assembly jar, they do > not match (use the cmp command) the classes in the stock protobuf 2.5.0 jar: > BoundedByteString$1.class > BoundedByteString$BoundedByteIterator.class > BoundedByteString.class > ByteString$1.class > ByteString$ByteIterator.class > ByteString$CodedBuilder.class > ByteString$Output.class > ByteString.class > CodedInputStream.class > CodedOutputStream$OutOfSpaceException.class > CodedOutputStream.class > LiteralByteString$1.class > LiteralByteString$LiteralByteIterator.class > LiteralByteString.class > All of these are dependency classes for HBaseZeroCopyByteString and they're > incompatible which explains the java.lang.IllegalAccessError. > What's not yet clear to me is how they can be wrong if the Spark pom > specifies 2.5.0: > <profile> > <id>hadoop-2.4</id> > <properties> > <hadoop.version>2.4.0</hadoop.version> > <protobuf.version>2.5.0</protobuf.version> > <jets3t.version>0.9.3</jets3t.version> > <hbase.version>0.98.7-hadoop2</hbase.version> > <commons.math3.version>3.1.1</commons.math3.version> > <avro.mapred.classifier>hadoop2</avro.mapred.classifier> > <codehaus.jackson.version>1.9.13</codehaus.jackson.version> > </properties> > </profile> > This looks correct and in theory should override the > <protobuf.version>2.4.1</protobuf.version> specified higher up in the parent > pom (https://github.com/apache/spark/blob/master/pom.xml). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org