Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow
After several days of debugging, we think the issue is that we have conflicting versions of Guava. Our application was running with Guava 14 and the Spark services (Master, Workers, Executors) had Guava 16. We had custom Kryo serializers for Guava's ImmutableLists, and commenting out those register calls did the trick. Have people had issues with Guava version mismatches in the past? I've found @srowen's Guava 14 - 11 downgrade PR here https://github.com/apache/spark/pull/1610 and some extended discussion on https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash and...@andrewash.com wrote: Hi everyone, I'm seeing the below exception coming out of Spark 1.0.1 when I call it from my application. I can't share the source to that application, but the quick gist is that it uses Spark's Java APIs to read from Avro files in HDFS, do processing, and write back to Avro files. It does this by receiving a REST call, then spinning up a new JVM as the driver application that connects to Spark. I'm using CDH4.4.0 and have enabled Kryo and also speculation. The cluster is running in standalone mode on a 6 node cluster in AWS (not using Spark's EC2 scripts though). The below stacktraces are reliably reproduceable on every run of the job. The issue seems to be that on deserialization of a task result on the driver, Kryo spits up while reading the ClassManifest. I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some backcompat issues) but had the same error. Any ideas on what can be done here? Thanks! Andrew In the driver (Kryo exception while deserializing a DirectTaskResult): INFO | jvm 1| 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while getting task result INFO | jvm 1| 2014/07/30 20:52:52 | com.esotericsoftware.kryo.KryoException: Buffer underflow. INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.io.Input.require(Input.java:156) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26) ~[chill_2.10-0.3.6.jar:0.3.6] INFO | jvm 1| 2014/07/30 20:52:52 | at com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19) ~[chill_2.10-0.3.6.jar:0.3.6] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] INFO | jvm 1| 2014/07/30 20:52:52 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] INFO | jvm 1| 2014/07/30 20:52:52 | at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] In the DAGScheduler
Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow
On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash and...@andrewash.com wrote: After several days of debugging, we think the issue is that we have conflicting versions of Guava. Our application was running with Guava 14 and the Spark services (Master, Workers, Executors) had Guava 16. We had custom Kryo serializers for Guava's ImmutableLists, and commenting out those register calls did the trick. Have people had issues with Guava version mismatches in the past? There's some discussion about dealing with Guava version issues in Spark in SPARK-2420. best, Colin I've found @srowen's Guava 14 - 11 downgrade PR here https://github.com/apache/spark/pull/1610 and some extended discussion on https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash and...@andrewash.com wrote: Hi everyone, I'm seeing the below exception coming out of Spark 1.0.1 when I call it from my application. I can't share the source to that application, but the quick gist is that it uses Spark's Java APIs to read from Avro files in HDFS, do processing, and write back to Avro files. It does this by receiving a REST call, then spinning up a new JVM as the driver application that connects to Spark. I'm using CDH4.4.0 and have enabled Kryo and also speculation. The cluster is running in standalone mode on a 6 node cluster in AWS (not using Spark's EC2 scripts though). The below stacktraces are reliably reproduceable on every run of the job. The issue seems to be that on deserialization of a task result on the driver, Kryo spits up while reading the ClassManifest. I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some backcompat issues) but had the same error. Any ideas on what can be done here? Thanks! Andrew In the driver (Kryo exception while deserializing a DirectTaskResult): INFO | jvm 1| 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while getting task result INFO | jvm 1| 2014/07/30 20:52:52 | com.esotericsoftware.kryo.KryoException: Buffer underflow. INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.io.Input.require(Input.java:156) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26) ~[chill_2.10-0.3.6.jar:0.3.6] INFO | jvm 1| 2014/07/30 20:52:52 | at com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19) ~[chill_2.10-0.3.6.jar:0.3.6] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] INFO | jvm 1| 2014/07/30 20:52:52 | at
Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow
Hi everyone, I'm seeing the below exception coming out of Spark 1.0.1 when I call it from my application. I can't share the source to that application, but the quick gist is that it uses Spark's Java APIs to read from Avro files in HDFS, do processing, and write back to Avro files. It does this by receiving a REST call, then spinning up a new JVM as the driver application that connects to Spark. I'm using CDH4.4.0 and have enabled Kryo and also speculation. The cluster is running in standalone mode on a 6 node cluster in AWS (not using Spark's EC2 scripts though). The below stacktraces are reliably reproduceable on every run of the job. The issue seems to be that on deserialization of a task result on the driver, Kryo spits up while reading the ClassManifest. I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some backcompat issues) but had the same error. Any ideas on what can be done here? Thanks! Andrew In the driver (Kryo exception while deserializing a DirectTaskResult): INFO | jvm 1| 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while getting task result INFO | jvm 1| 2014/07/30 20:52:52 | com.esotericsoftware.kryo.KryoException: Buffer underflow. INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.io.Input.require(Input.java:156) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26) ~[chill_2.10-0.3.6.jar:0.3.6] INFO | jvm 1| 2014/07/30 20:52:52 | at com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19) ~[chill_2.10-0.3.6.jar:0.3.6] INFO | jvm 1| 2014/07/30 20:52:52 | at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) ~[kryo-2.21.jar:na] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316) ~[spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) [spark-core_2.10-1.0.1.jar:1.0.1] INFO | jvm 1| 2014/07/30 20:52:52 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] INFO | jvm 1| 2014/07/30 20:52:52 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] INFO | jvm 1| 2014/07/30 20:52:52 | at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] In the DAGScheduler (job gets aborted): org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: Buffer underflow. at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at