Re: ClassCastException in driver program

2015-09-08 Thread Jeff Jones
Thanks for the response.
Turns out that this post addressed the issue. 
http://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser
We have some UDFs defined and the jar containing the class for these UDFs 
wasn’t in the dependent jars list.  Unfortunately the actual error got masked 
by the one I sent below.

Jeff

From: Shixiong Zhu
Date: Sunday, September 6, 2015 at 9:02 AM
To: Jeff Jones
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re: ClassCastException in driver program

Looks there are some circular references in SQL making the immutable List 
serialization fail in 2.11.

In 2.11, Scala immutable List uses writeReplace()/readResolve() which don't 
play nicely with circular references. Here is an example to reproduce this 
issue in 2.11.6:

  class Foo extends Serializable {
var l: Seq[Any] = null
  }

  import java.io._

  val o = new ByteArrayOutputStream()
  val o1 = new ObjectOutputStream(o)
  val m = new Foo
  val n = List(1, m)
  m.l = n
  o1.writeObject(n)
  o1.close()
  val i = new ByteArrayInputStream(o.toByteArray)
  val i1 = new ObjectInputStream(i)
  i1.readObject()

Could you provide the "explain" output? It would be helpful to find the 
circular references.




Best Regards,

Shixiong Zhu

2015-09-05 0:26 GMT+08:00 Jeff Jones 
<jjo...@adaptivebiotech.com<mailto:jjo...@adaptivebiotech.com>>:
We are using Scala 2.11 for a driver program that is running Spark SQL queries 
in a standalone cluster. I’ve rebuilt Spark for Scala 2.11 using the 
instructions at http://spark.apache.org/docs/latest/building-spark.html.  I’ve 
had to work through a few dependency conflict but all-in-all it seems to work 
for some simple Spark examples. I integrated the Spark SQL code into my 
application and I’m able to run using a local client, but when I switch over to 
the standalone cluster I get the following error.  Any help tracking this down 
would be appreciated.

This exception occurs during a DataFrame.collect() call. I’ve tried to use 
–Dsun.io.serialization.extendedDebugInfo=true to get more information but it 
didn’t provide anything more.


[error] o.a.s.s.TaskSetManager - Task 0 in stage 1.0 failed 4 times; aborting 
job

[error] c.a.i.c.Analyzer - Job aborted due to stage failure: Task 0 in stage 
1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 
10.248.0.242): java.lang.ClassCastException: cannot assign instance of 
scala.collection.immutable.List$SerializationProxy to field 
org.apache.spark.sql.execution.Project.projectList of type scala.collection.Seq 
in instance of org.apache.spark.sql.execution.Project

at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(Unknown Source)

at java.io.ObjectStreamClass.setObjFieldValues(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:477)

at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

at java.lang.reflect.Method.invoke(Unknown Source)

at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at ja

Re: ClassCastException in driver program

2015-09-06 Thread Shixiong Zhu
Looks there are some circular references in SQL making the immutable List
serialization fail in 2.11.

In 2.11, Scala immutable List uses writeReplace()/readResolve() which don't
play nicely with circular references. Here is an example to reproduce this
issue in 2.11.6:

  class Foo extends Serializable {
var l: Seq[Any] = null
  }

  import java.io._

  val o = new ByteArrayOutputStream()
  val o1 = new ObjectOutputStream(o)
  val m = new Foo
  val n = List(1, m)
  m.l = n
  o1.writeObject(n)
  o1.close()
  val i = new ByteArrayInputStream(o.toByteArray)
  val i1 = new ObjectInputStream(i)
  i1.readObject()

Could you provide the "explain" output? It would be helpful to find the
circular references.



Best Regards,
Shixiong Zhu

2015-09-05 0:26 GMT+08:00 Jeff Jones :

> We are using Scala 2.11 for a driver program that is running Spark SQL
> queries in a standalone cluster. I’ve rebuilt Spark for Scala 2.11 using
> the instructions at
> http://spark.apache.org/docs/latest/building-spark.html.  I’ve had to
> work through a few dependency conflict but all-in-all it seems to work for
> some simple Spark examples. I integrated the Spark SQL code into my
> application and I’m able to run using a local client, but when I switch
> over to the standalone cluster I get the following error.  Any help
> tracking this down would be appreciated.
>
> This exception occurs during a DataFrame.collect() call. I’ve tried to use
> –Dsun.io.serialization.extendedDebugInfo=true to get more information but
> it didn’t provide anything more.
>
> [error] o.a.s.s.TaskSetManager - Task 0 in stage 1.0 failed 4 times;
> aborting job
>
> [error] c.a.i.c.Analyzer - Job aborted due to stage failure: Task 0 in
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0
> (TID 4, 10.248.0.242): java.lang.ClassCastException: cannot assign instance
> of scala.collection.immutable.List$SerializationProxy to field
> org.apache.spark.sql.execution.Project.projectList of type
> scala.collection.Seq in instance of org.apache.spark.sql.execution.Project
>
> at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(Unknown
> Source)
>
> at java.io.ObjectStreamClass.setObjFieldValues(Unknown Source)
>
> at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.readObject(Unknown Source)
>
> at
> scala.collection.immutable.List$SerializationProxy.readObject(List.scala:477)
>
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
> at java.lang.reflect.Method.invoke(Unknown Source)
>
> at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>
> at java.io.ObjectInputStream.readSerialData(Unknown Source)
>
> at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>
> at java.io.ObjectInputStream.readObject0(Unknown Source)
>
> at java.io.ObjectInputStream.readObject(Unknown Source)
>
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
>
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)

ClassCastException in driver program

2015-09-04 Thread Jeff Jones
We are using Scala 2.11 for a driver program that is running Spark SQL queries 
in a standalone cluster. I’ve rebuilt Spark for Scala 2.11 using the 
instructions at http://spark.apache.org/docs/latest/building-spark.html.  I’ve 
had to work through a few dependency conflict but all-in-all it seems to work 
for some simple Spark examples. I integrated the Spark SQL code into my 
application and I’m able to run using a local client, but when I switch over to 
the standalone cluster I get the following error.  Any help tracking this down 
would be appreciated.

This exception occurs during a DataFrame.collect() call. I’ve tried to use 
–Dsun.io.serialization.extendedDebugInfo=true to get more information but it 
didn’t provide anything more.


[error] o.a.s.s.TaskSetManager - Task 0 in stage 1.0 failed 4 times; aborting 
job

[error] c.a.i.c.Analyzer - Job aborted due to stage failure: Task 0 in stage 
1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 
10.248.0.242): java.lang.ClassCastException: cannot assign instance of 
scala.collection.immutable.List$SerializationProxy to field 
org.apache.spark.sql.execution.Project.projectList of type scala.collection.Seq 
in instance of org.apache.spark.sql.execution.Project

at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(Unknown Source)

at java.io.ObjectStreamClass.setObjFieldValues(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:477)

at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

at java.lang.reflect.Method.invoke(Unknown Source)

at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)

at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)

at org.apache.spark.scheduler.Task.run(Task.scala:70)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

Thanks,
Jeff


This message (and any attachments) is intended only for the designated 
recipient(s). It
may contain confidential or proprietary information, or have other limitations 
on use as
indicated by the sender. If you are not a designated recipient, you may not 
review, use,
copy or distribute this message. If you received this in error, please notify 
the sender by
reply e-mail and delete this message.