[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516241#comment-14516241 ] Dmitry Goldenberg commented on SPARK-7154: -- I've added this info to the 'mirror' ticket in Phoenix' JIRA I filed [PHOENIX-1926|https://issues.apache.org/jira/browse/PHOENIX-1926]: It looks like I got this to work, with the following in my spark-submit invocation: {code} ./bin/spark-submit \ --driver-class-path $HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar \ --driver-java-options -Dspark.driver.extraClassPath=$HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar \ --class com.myco.Driver \ --master local[*] \ /mnt/data/myco.jar \ {code} The crucial part to getting this to work was the first parameter, --driver-class-path. Things work with just that and without spark.driver.extraClassPath. Things do not work with spark.driver.extraClassPath but no --driver-class-path, and of course they don't work with both of these missing in the invocation. I also have Phoenix' dependency hbase and hadoop jars' classes rolled into my driver jar. Tested with them in the job jar and without, the error goes away in either case if the protocol jar is on the driver-class-path. Since I've only tried this in the local mode, it's not yet clear to me whether spark.driver.extraClassPath and/or rolling the Phoenix' dependency hbase and hadoop jars into the job jar would be required. But the good news is that there's clearly a 'mojo' for getting this to work so one gets no IllegalAccessError anymore. Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg Attachments: in-google-protobuf-2.5.0.zip, in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231) at org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) at
[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516242#comment-14516242 ] Dmitry Goldenberg commented on SPARK-7154: -- Could this possibly be documented in the Spark doc set somewhere as part of resolving this ticket? Thanks. Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg Attachments: in-google-protobuf-2.5.0.zip, in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231) at org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179) at com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139) at com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144) at com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208) at com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79) at com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83) at com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198) at
[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514303#comment-14514303 ] Dmitry Goldenberg commented on SPARK-7154: -- Thanks, Sean. What is the right way of setting spark.executor.userClassPathFirst ? If I try and set it on SparkConf programmatically it seems to be ignored. {code} SparkConf sparkConf = new SparkConf().setAppName(appName); sparkConf.set(spark.executor.userClassPathFirst, true); {code} Is --driver-java-options -Dspark.executor.userClassPathFirst=true the recommended approach? I'm now wrapping all the HBase and Hadoop dependency classes of Phoenix into the Spark job jar but I want to make sure they take precedence. Spark's assembly jar has its own protobuf and Hadoop classes so perhaps I'm clashing with those. Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg Attachments: in-google-protobuf-2.5.0.zip, in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231) at org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179) at com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139) at com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144) at com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305) at
[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514314#comment-14514314 ] Dmitry Goldenberg commented on SPARK-7154: -- Trying to use --driver-java-options -Dspark.executor.userClassPathFirst=true is having an affect although not a positive one: {code} 15/04/27 11:36:31 ERROR scheduler.JobScheduler: Error running job streaming job 143014899 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: cannot assign instance of scala.None$ to field org.apache.spark.scheduler.Task.metrics of type scala.Option in instance of org.apache.spark.scheduler.ResultTask at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {code} Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg Attachments: in-google-protobuf-2.5.0.zip, in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at
[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513303#comment-14513303 ] Dmitry Goldenberg commented on SPARK-7154: -- Not sure if this may be an HBase issue... as per http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-Release-Notes/csrn_known_issues_current.html I'm now running with HADOOP_CLASSPATH pointing at my hbase protocol jar: export HADOOP_CLASSPATH=$HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar and am still getting errors e.g. {code} java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: class com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231) at org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179) {code} Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg Priority: Critical Attachments: in-google-protobuf-2.5.0.zip, in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at
[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513218#comment-14513218 ] Dmitry Goldenberg commented on SPARK-7154: -- This is really a blocker for me, I can't work around either issue. Any ideas would be appreciated! Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg Priority: Critical Attachments: in-google-protobuf-2.5.0.zip, in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231) at org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179) at com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139) at com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144) at com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208) at com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79) at com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83) at com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198) at
[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513208#comment-14513208 ] Dmitry Goldenberg commented on SPARK-7154: -- As a workaround, I just tried setting the following {code} SparkConf sparkConf = new SparkConf().setAppName(appName); sparkConf.set(spark.executor.userClassPathFirst, true); {code} running in standalone mode, and started getting the following exceptions from Spark while trying to run the jobs with spark-submit: {code} 15/04/26 15:19:00 ERROR scheduler.JobScheduler: Error running job streaming job 143007594 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID 9, localhost): java.lang.ClassCastException: cannot assign instance of scala.None$ to field org.apache.spark.scheduler.Task.metrics of type scala.Option in instance of org.apache.spark.scheduler.ResultTask at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {code} I need a solution for either side of the issue. Either Spark needs to pull in the right protobuf classes or this new exception needs to be worked around, otherwise I'm basically dead in the water. Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at
[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes
[ https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513370#comment-14513370 ] Sean Owen commented on SPARK-7154: -- Despite its package, this class appears to be an HBase class. I think it's in this package to access something package-private. I do indeed think there is a classloader tangle and that you want your user classpath first. I agree that it looks like an HBase issue, so I don't think this is to do with Spark or protobuf per se. Spark distro appears to be pulling in incorrect protobuf classes Key: SPARK-7154 URL: https://issues.apache.org/jira/browse/SPARK-7154 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Dmitry Goldenberg Priority: Critical Attachments: in-google-protobuf-2.5.0.zip, in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt If you download Spark via the site: https://spark.apache.org/downloads.html, for example I chose: http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz then you may see incompatibility with other libraries due to incorrect protobuf classes. I'm seeing such a case in my Spark Streaming job which attempts to use Apache Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 dependency. However, at runtime Spark's classes take precedence in class loading and that is causing exceptions such as the following: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231) at org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207) at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179) at com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139) at com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144) at com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305) at com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208) at com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79) at com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83) at