[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-27 Thread Dmitry Goldenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516241#comment-14516241
 ] 

Dmitry Goldenberg commented on SPARK-7154:
--

I've added this info to the 'mirror' ticket in Phoenix' JIRA I filed 
[PHOENIX-1926|https://issues.apache.org/jira/browse/PHOENIX-1926]:

It looks like I got this to work, with the following in my spark-submit 
invocation:
{code}
./bin/spark-submit \
--driver-class-path $HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar \
--driver-java-options 
-Dspark.driver.extraClassPath=$HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar
 \
--class com.myco.Driver \
--master local[*]  \
/mnt/data/myco.jar \
{code}
The crucial part to getting this to work was the first parameter, 
--driver-class-path. Things work with just that and without 
spark.driver.extraClassPath. Things do not work with 
spark.driver.extraClassPath but no --driver-class-path, and of course they 
don't work with both of these missing in the invocation.
I also have Phoenix' dependency hbase and hadoop jars' classes rolled into my 
driver jar. Tested with them in the job jar and without, the error goes away in 
either case if the protocol jar is on the driver-class-path.

Since I've only tried this in the local mode, it's not yet clear to me whether 
spark.driver.extraClassPath and/or rolling the Phoenix' dependency hbase and 
hadoop jars into the job jar would be required. But the good news is that 
there's clearly a 'mojo' for getting this to work so one gets no 
IllegalAccessError anymore.

 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg
 Attachments: in-google-protobuf-2.5.0.zip, 
 in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt


 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
 at 
 org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231)
 at 
 org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
 at 
 org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
 at 
 

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-27 Thread Dmitry Goldenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516242#comment-14516242
 ] 

Dmitry Goldenberg commented on SPARK-7154:
--

Could this possibly be documented in the Spark doc set somewhere as part of 
resolving this ticket? Thanks.

 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg
 Attachments: in-google-protobuf-2.5.0.zip, 
 in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt


 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
 at 
 org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231)
 at 
 org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
 at 
 org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179)
 at 
 com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139)
 at 
 com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144)
 at 
 com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62)
 at 
 com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305)
 at 
 com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208)
 at 
 com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79)
 at 
 com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83)
 at 
 com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198)
 at 
 

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-27 Thread Dmitry Goldenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514303#comment-14514303
 ] 

Dmitry Goldenberg commented on SPARK-7154:
--

Thanks, Sean. What is the right way of setting 
spark.executor.userClassPathFirst ?  If I try and set it on SparkConf 
programmatically it seems to be ignored.

{code}
SparkConf sparkConf = new SparkConf().setAppName(appName);
sparkConf.set(spark.executor.userClassPathFirst, true);
{code}

Is --driver-java-options -Dspark.executor.userClassPathFirst=true the 
recommended approach?

I'm now wrapping all the HBase and Hadoop dependency classes of Phoenix into 
the Spark job jar but I want to make sure they take precedence.  Spark's 
assembly jar has its own protobuf and Hadoop classes so perhaps I'm clashing 
with those.

 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg
 Attachments: in-google-protobuf-2.5.0.zip, 
 in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt


 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
 at 
 org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231)
 at 
 org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
 at 
 org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179)
 at 
 com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139)
 at 
 com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144)
 at 
 com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62)
 at 
 com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305)
 at 
 

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-27 Thread Dmitry Goldenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514314#comment-14514314
 ] 

Dmitry Goldenberg commented on SPARK-7154:
--

Trying to use --driver-java-options -Dspark.executor.userClassPathFirst=true 
is having an affect although not a positive one:

{code}
15/04/27 11:36:31 ERROR scheduler.JobScheduler: Error running job streaming job 
143014899 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
0, localhost): java.lang.ClassCastException: cannot assign instance of 
scala.None$ to field org.apache.spark.scheduler.Task.metrics of type 
scala.Option in instance of org.apache.spark.scheduler.ResultTask
at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at 
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}


 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg
 Attachments: in-google-protobuf-2.5.0.zip, 
 in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt


 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
 at 
 

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-26 Thread Dmitry Goldenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513303#comment-14513303
 ] 

Dmitry Goldenberg commented on SPARK-7154:
--

Not sure if this may be an HBase issue... as per
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-Release-Notes/csrn_known_issues_current.html

I'm now running with HADOOP_CLASSPATH pointing at my hbase protocol jar:
export HADOOP_CLASSPATH=$HBASE_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar

and am still getting errors e.g.

{code}
java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: class 
com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass 
com.google.protobuf.LiteralByteString
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
at 
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
at 
org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
at 
org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237)
at 
org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231)
at 
org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
at 
org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174)
at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179)
{code}

 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg
Priority: Critical
 Attachments: in-google-protobuf-2.5.0.zip, 
 in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt


 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
 at 
 

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-26 Thread Dmitry Goldenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513218#comment-14513218
 ] 

Dmitry Goldenberg commented on SPARK-7154:
--

This is really a blocker for me, I can't work around either issue. Any ideas 
would be appreciated!

 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg
Priority: Critical
 Attachments: in-google-protobuf-2.5.0.zip, 
 in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt


 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
 at 
 org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231)
 at 
 org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
 at 
 org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179)
 at 
 com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139)
 at 
 com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144)
 at 
 com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62)
 at 
 com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305)
 at 
 com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208)
 at 
 com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79)
 at 
 com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83)
 at 
 com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198)
 at 
 

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-26 Thread Dmitry Goldenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513208#comment-14513208
 ] 

Dmitry Goldenberg commented on SPARK-7154:
--

As a workaround, I just tried setting the following
{code}
SparkConf sparkConf = new SparkConf().setAppName(appName);
sparkConf.set(spark.executor.userClassPathFirst, true);
{code}
running in standalone mode, and started getting the following exceptions from 
Spark while trying to run the jobs with spark-submit:

{code}
15/04/26 15:19:00 ERROR scheduler.JobScheduler: Error running job streaming job 
143007594 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 9.0 failed 1 times, most recent  failure: Lost 
task 0.0 in stage 9.0 (TID 9, localhost): java.lang.ClassCastException: cannot 
assign instance of scala.None$ to field org.apache.spark.scheduler.Task.metrics 
of type scala.Option in instance of org.apache.spark.scheduler.ResultTask
at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at 
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}

I need a solution for either side of the issue. Either Spark needs to pull in 
the right protobuf classes or this new exception needs to be worked around, 
otherwise I'm basically dead in the water.

 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg

 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 

[jira] [Commented] (SPARK-7154) Spark distro appears to be pulling in incorrect protobuf classes

2015-04-26 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513370#comment-14513370
 ] 

Sean Owen commented on SPARK-7154:
--

Despite its package, this class appears to be an HBase class. I think it's in 
this package to access something package-private. I do indeed think there is a 
classloader tangle and that you want your user classpath first. I agree that it 
looks like an HBase issue, so I don't think this is to do with Spark or 
protobuf per se. 

 Spark distro appears to be pulling in incorrect protobuf classes
 

 Key: SPARK-7154
 URL: https://issues.apache.org/jira/browse/SPARK-7154
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Dmitry Goldenberg
Priority: Critical
 Attachments: in-google-protobuf-2.5.0.zip, 
 in-spark-1.3.1-local-build.zip, spark-1.3.1-local-build.txt


 If you download Spark via the site: 
 https://spark.apache.org/downloads.html,
 for example I chose:
 http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
 then you may see incompatibility with other libraries due to incorrect 
 protobuf classes.
 I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
 Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
 dependency. However, at runtime Spark's classes take precedence in class 
 loading and that is causing exceptions such as the following:
 java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
 com/google/protobuf/HBaseZeroCopyByteString
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
 at 
 org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
 at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
 at 
 org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
 at 
 org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:237)
 at 
 org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.init(FromCompiler.java:231)
 at 
 org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
 at 
 org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174)
 at 
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179)
 at 
 com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139)
 at 
 com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144)
 at 
 com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62)
 at 
 com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305)
 at 
 com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208)
 at 
 com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79)
 at 
 com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83)
 at