Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Andrew Ash
After several days of debugging, we think the issue is that we have
conflicting versions of Guava.  Our application was running with Guava 14
and the Spark services (Master, Workers, Executors) had Guava 16.  We had
custom Kryo serializers for Guava's ImmutableLists, and commenting out
those register calls did the trick.

Have people had issues with Guava version mismatches in the past?

I've found @srowen's Guava 14 - 11 downgrade PR here
https://github.com/apache/spark/pull/1610 and some extended discussion on
https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility


On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash and...@andrewash.com wrote:

 Hi everyone,

 I'm seeing the below exception coming out of Spark 1.0.1 when I call it
 from my application.  I can't share the source to that application, but the
 quick gist is that it uses Spark's Java APIs to read from Avro files in
 HDFS, do processing, and write back to Avro files.  It does this by
 receiving a REST call, then spinning up a new JVM as the driver application
 that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and also
 speculation.  The cluster is running in standalone mode on a 6 node cluster
 in AWS (not using Spark's EC2 scripts though).

 The below stacktraces are reliably reproduceable on every run of the job.
  The issue seems to be that on deserialization of a task result on the
 driver, Kryo spits up while reading the ClassManifest.

 I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
 backcompat issues) but had the same error.

 Any ideas on what can be done here?

 Thanks!
 Andrew



 In the driver (Kryo exception while deserializing a DirectTaskResult):

 INFO   | jvm 1| 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver
 thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
 getting task result
 INFO   | jvm 1| 2014/07/30 20:52:52 |
 com.esotericsoftware.kryo.KryoException: Buffer underflow.
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.io.Input.require(Input.java:156)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
 ~[chill_2.10-0.3.6.jar:0.3.6]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
 ~[chill_2.10-0.3.6.jar:0.3.6]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 [na:1.7.0_65]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 [na:1.7.0_65]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]


 In the DAGScheduler 

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Colin McCabe
On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash and...@andrewash.com wrote:
 After several days of debugging, we think the issue is that we have
 conflicting versions of Guava.  Our application was running with Guava 14
 and the Spark services (Master, Workers, Executors) had Guava 16.  We had
 custom Kryo serializers for Guava's ImmutableLists, and commenting out
 those register calls did the trick.

 Have people had issues with Guava version mismatches in the past?

There's some discussion about dealing with Guava version issues in
Spark in SPARK-2420.

best,
Colin



 I've found @srowen's Guava 14 - 11 downgrade PR here
 https://github.com/apache/spark/pull/1610 and some extended discussion on
 https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility


 On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash and...@andrewash.com wrote:

 Hi everyone,

 I'm seeing the below exception coming out of Spark 1.0.1 when I call it
 from my application.  I can't share the source to that application, but the
 quick gist is that it uses Spark's Java APIs to read from Avro files in
 HDFS, do processing, and write back to Avro files.  It does this by
 receiving a REST call, then spinning up a new JVM as the driver application
 that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and also
 speculation.  The cluster is running in standalone mode on a 6 node cluster
 in AWS (not using Spark's EC2 scripts though).

 The below stacktraces are reliably reproduceable on every run of the job.
  The issue seems to be that on deserialization of a task result on the
 driver, Kryo spits up while reading the ClassManifest.

 I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
 backcompat issues) but had the same error.

 Any ideas on what can be done here?

 Thanks!
 Andrew



 In the driver (Kryo exception while deserializing a DirectTaskResult):

 INFO   | jvm 1| 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver
 thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
 getting task result
 INFO   | jvm 1| 2014/07/30 20:52:52 |
 com.esotericsoftware.kryo.KryoException: Buffer underflow.
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.io.Input.require(Input.java:156)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
 ~[chill_2.10-0.3.6.jar:0.3.6]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
 ~[chill_2.10-0.3.6.jar:0.3.6]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 ~[kryo-2.21.jar:na]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
 ~[spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 [spark-core_2.10-1.0.1.jar:1.0.1]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 [na:1.7.0_65]
 INFO   | jvm 1| 2014/07/30 20:52:52 |   at
 

Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-07-31 Thread Andrew Ash
Hi everyone,

I'm seeing the below exception coming out of Spark 1.0.1 when I call it
from my application.  I can't share the source to that application, but the
quick gist is that it uses Spark's Java APIs to read from Avro files in
HDFS, do processing, and write back to Avro files.  It does this by
receiving a REST call, then spinning up a new JVM as the driver application
that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and also
speculation.  The cluster is running in standalone mode on a 6 node cluster
in AWS (not using Spark's EC2 scripts though).

The below stacktraces are reliably reproduceable on every run of the job.
 The issue seems to be that on deserialization of a task result on the
driver, Kryo spits up while reading the ClassManifest.

I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
backcompat issues) but had the same error.

Any ideas on what can be done here?

Thanks!
Andrew



In the driver (Kryo exception while deserializing a DirectTaskResult):

INFO   | jvm 1| 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver
thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
getting task result
INFO   | jvm 1| 2014/07/30 20:52:52 |
com.esotericsoftware.kryo.KryoException: Buffer underflow.
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
com.esotericsoftware.kryo.io.Input.require(Input.java:156)
~[kryo-2.21.jar:na]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
~[kryo-2.21.jar:na]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
~[kryo-2.21.jar:na]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
~[chill_2.10-0.3.6.jar:0.3.6]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
~[chill_2.10-0.3.6.jar:0.3.6]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
~[kryo-2.21.jar:na]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
INFO   | jvm 1| 2014/07/30 20:52:52 |   at
java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]


In the DAGScheduler (job gets aborted):

org.apache.spark.SparkException: Job aborted due to stage failure:
Exception while getting task result:
com.esotericsoftware.kryo.KryoException: Buffer underflow.
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at