After several days of debugging, we think the issue is that we have
conflicting versions of Guava.  Our application was running with Guava 14
and the Spark services (Master, Workers, Executors) had Guava 16.  We had
custom Kryo serializers for Guava's ImmutableLists, and commenting out
those register calls did the trick.

Have people had issues with Guava version mismatches in the past?

I've found @srowen's Guava 14 -> 11 downgrade PR here
https://github.com/apache/spark/pull/1610 and some extended discussion on
https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility


On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash <and...@andrewash.com> wrote:

> Hi everyone,
>
> I'm seeing the below exception coming out of Spark 1.0.1 when I call it
> from my application.  I can't share the source to that application, but the
> quick gist is that it uses Spark's Java APIs to read from Avro files in
> HDFS, do processing, and write back to Avro files.  It does this by
> receiving a REST call, then spinning up a new JVM as the driver application
> that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and also
> speculation.  The cluster is running in standalone mode on a 6 node cluster
> in AWS (not using Spark's EC2 scripts though).
>
> The below stacktraces are reliably reproduceable on every run of the job.
>  The issue seems to be that on deserialization of a task result on the
> driver, Kryo spits up while reading the ClassManifest.
>
> I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
> backcompat issues) but had the same error.
>
> Any ideas on what can be done here?
>
> Thanks!
> Andrew
>
>
>
> In the driver (Kryo exception while deserializing a DirectTaskResult):
>
> INFO   | jvm 1    | 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver
> thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
> getting task result
> INFO   | jvm 1    | 2014/07/30 20:52:52 |
> com.esotericsoftware.kryo.KryoException: Buffer underflow.
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> com.esotericsoftware.kryo.io.Input.require(Input.java:156)
> ~[kryo-2.21.jar:na]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
> ~[kryo-2.21.jar:na]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
> ~[kryo-2.21.jar:na]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
> ~[chill_2.10-0.3.6.jar:0.3.6]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
> ~[chill_2.10-0.3.6.jar:0.3.6]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> ~[kryo-2.21.jar:na]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> [spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> [spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> [spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
> [spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> [spark-core_2.10-1.0.1.jar:1.0.1]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_65]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_65]
> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>
>
> In the DAGScheduler (job gets aborted):
>
> org.apache.spark.SparkException: Job aborted due to stage failure:
> Exception while getting task result:
> com.esotericsoftware.kryo.KryoException: Buffer underflow.
>     at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>     at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>     at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>     at scala.Option.foreach(Option.scala:236)
>     at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>     at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>     at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>     at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>     at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>     at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> In an Executor (running tasks get killed):
>
> 14/07/29 22:57:38 INFO broadcast.HttpBroadcast: Started reading broadcast
> variable 0
> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
> 153
> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
> 147
> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
> 141
> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
> 135
> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
> 150
> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
> 144
> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
> 138
> 14/07/29 22:57:39 INFO storage.MemoryStore: ensureFreeSpace(241733) called
> with curMem=0, maxMem=30870601728
> 14/07/29 22:57:39 INFO storage.MemoryStore: Block broadcast_0 stored as
> values to memory (estimated size 236.1 KB, free 28.8 GB)
> 14/07/29 22:57:39 INFO broadcast.HttpBroadcast: Reading broadcast variable
> 0 took 0.91790748 s
> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> locally
> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> locally
> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> locally
> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> locally
> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> locally
> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> locally
> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 135
> org.apache.spark.TaskKilledException
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 144
> org.apache.spark.TaskKilledException
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 150
> org.apache.spark.TaskKilledException
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 138
> org.apache.spark.TaskKilledException
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 141
> org.apache.spark.TaskKilledException
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>

Reply via email to