On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash <and...@andrewash.com> wrote: > After several days of debugging, we think the issue is that we have > conflicting versions of Guava. Our application was running with Guava 14 > and the Spark services (Master, Workers, Executors) had Guava 16. We had > custom Kryo serializers for Guava's ImmutableLists, and commenting out > those register calls did the trick. > > Have people had issues with Guava version mismatches in the past?
There's some discussion about dealing with Guava version issues in Spark in SPARK-2420. best, Colin > > I've found @srowen's Guava 14 -> 11 downgrade PR here > https://github.com/apache/spark/pull/1610 and some extended discussion on > https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility > > > On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash <and...@andrewash.com> wrote: > >> Hi everyone, >> >> I'm seeing the below exception coming out of Spark 1.0.1 when I call it >> from my application. I can't share the source to that application, but the >> quick gist is that it uses Spark's Java APIs to read from Avro files in >> HDFS, do processing, and write back to Avro files. It does this by >> receiving a REST call, then spinning up a new JVM as the driver application >> that connects to Spark. I'm using CDH4.4.0 and have enabled Kryo and also >> speculation. The cluster is running in standalone mode on a 6 node cluster >> in AWS (not using Spark's EC2 scripts though). >> >> The below stacktraces are reliably reproduceable on every run of the job. >> The issue seems to be that on deserialization of a task result on the >> driver, Kryo spits up while reading the ClassManifest. >> >> I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some >> backcompat issues) but had the same error. >> >> Any ideas on what can be done here? >> >> Thanks! >> Andrew >> >> >> >> In the driver (Kryo exception while deserializing a DirectTaskResult): >> >> INFO | jvm 1 | 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver >> thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while >> getting task result >> INFO | jvm 1 | 2014/07/30 20:52:52 | >> com.esotericsoftware.kryo.KryoException: Buffer underflow. >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> com.esotericsoftware.kryo.io.Input.require(Input.java:156) >> ~[kryo-2.21.jar:na] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> com.esotericsoftware.kryo.io.Input.readInt(Input.java:337) >> ~[kryo-2.21.jar:na] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762) >> ~[kryo-2.21.jar:na] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26) >> ~[chill_2.10-0.3.6.jar:0.3.6] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19) >> ~[chill_2.10-0.3.6.jar:0.3.6] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) >> ~[kryo-2.21.jar:na] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147) >> ~[spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) >> ~[spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480) >> ~[spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316) >> ~[spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) >> [spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) >> [spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) >> [spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) >> [spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) >> [spark-core_2.10-1.0.1.jar:1.0.1] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> [na:1.7.0_65] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> [na:1.7.0_65] >> INFO | jvm 1 | 2014/07/30 20:52:52 | at >> java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] >> >> >> In the DAGScheduler (job gets aborted): >> >> org.apache.spark.SparkException: Job aborted due to stage failure: >> Exception while getting task result: >> com.esotericsoftware.kryo.KryoException: Buffer underflow. >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >> at >> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> >> >> In an Executor (running tasks get killed): >> >> 14/07/29 22:57:38 INFO broadcast.HttpBroadcast: Started reading broadcast >> variable 0 >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task >> 153 >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task >> 147 >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task >> 141 >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task >> 135 >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task >> 150 >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task >> 144 >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task >> 138 >> 14/07/29 22:57:39 INFO storage.MemoryStore: ensureFreeSpace(241733) called >> with curMem=0, maxMem=30870601728 >> 14/07/29 22:57:39 INFO storage.MemoryStore: Block broadcast_0 stored as >> values to memory (estimated size 236.1 KB, free 28.8 GB) >> 14/07/29 22:57:39 INFO broadcast.HttpBroadcast: Reading broadcast variable >> 0 took 0.91790748 s >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 >> locally >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 >> locally >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 >> locally >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 >> locally >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 >> locally >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 >> locally >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 135 >> org.apache.spark.TaskKilledException >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 144 >> org.apache.spark.TaskKilledException >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 150 >> org.apache.spark.TaskKilledException >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 138 >> org.apache.spark.TaskKilledException >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 141 >> org.apache.spark.TaskKilledException >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>