Dmitry Goldenberg created SPARK-7154: ----------------------------------------
Summary: Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.1 Reporter: Dmitry Goldenberg If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:237) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:231) at org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179) at com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139) at com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144) at com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208) at com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79) at com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83) at com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1265) at org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1258) at org.apache.hadoop.hbase.client.HTable$17.call(HTable.java:1608) at java.util.concurrent.FutureTask.run(FutureTask.java:266) If you look at the protobuf classes inside the Spark assembly jar, they do not match (use the cmp command) the classes in the stock protobuf 2.5.0 jar: BoundedByteString$1.class BoundedByteString$BoundedByteIterator.class BoundedByteString.class ByteString$1.class ByteString$ByteIterator.class ByteString$CodedBuilder.class ByteString$Output.class ByteString.class CodedInputStream.class CodedOutputStream$OutOfSpaceException.class CodedOutputStream.class LiteralByteString$1.class LiteralByteString$LiteralByteIterator.class LiteralByteString.class All of these are dependency classes for HBaseZeroCopyByteString and they're incompatible which explains the java.lang.IllegalAccessError. What's not yet clear to me is how they can be wrong if the Spark pom specifies 2.5.0: <profile> <id>hadoop-2.4</id> <properties> <hadoop.version>2.4.0</hadoop.version> <protobuf.version>2.5.0</protobuf.version> <jets3t.version>0.9.3</jets3t.version> <hbase.version>0.98.7-hadoop2</hbase.version> <commons.math3.version>3.1.1</commons.math3.version> <avro.mapred.classifier>hadoop2</avro.mapred.classifier> <codehaus.jackson.version>1.9.13</codehaus.jackson.version> </properties> </profile> This looks correct and in theory should override the <protobuf.version>2.4.1</protobuf.version> specified higher up in the parent pom (https://github.com/apache/spark/blob/master/pom.xml). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org