Re: StreamCorruptedException during deserialization
All the Jars and Java versions are consistent in my setup. In fact, I have Spark sorting 1TB of data using the exact same setup, except with another file system as storage for the data nodes. Could it be that there is actual corruption in the files written? On Tue, Mar 29, 2016 at 12:00 PM, Simon Hafnerwrote: > 2016-03-29 11:25 GMT+02:00 Robert Schmidtke : > > Is there a meaningful way for me to find out what exactly is going wrong > > here? Any help and hints are greatly appreciated! > Maybe a version mismatch between the jars on the cluster? > -- My GPG Key ID: 336E2680
Re: StreamCorruptedException during deserialization
2016-03-29 11:25 GMT+02:00 Robert Schmidtke: > Is there a meaningful way for me to find out what exactly is going wrong > here? Any help and hints are greatly appreciated! Maybe a version mismatch between the jars on the cluster? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
StreamCorruptedException during deserialization
Hi everyone, I'm running the Intel HiBench TeraSort (1TB) Spark Scala benchmark on Spark 1.6.0. After some time, I'm seeing one task fail too many times, despite being rescheduled on different nodes with the following stacktrace: 16/03/27 22:25:04 WARN scheduler.TaskSetManager: Lost task 97.0 in stage 2.0 (TID 2337, ): java.io.StreamCorruptedException: invalid stream header: 32313530 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806) at java.io.ObjectInputStream.(ObjectInputStream.java:299) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.(JavaSerializer.scala:64) at org.apache.spark.serializer.JavaDeserializationStream.(JavaSerializer.scala:64) at org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:123) at org.apache.spark.shuffle.BlockStoreShuffleReader$$anonfun$3.apply(BlockStoreShuffleReader.scala:64) at org.apache.spark.shuffle.BlockStoreShuffleReader$$anonfun$3.apply(BlockStoreShuffleReader.scala:60) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197) at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:103) at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Is there a meaningful way for me to find out what exactly is going wrong here? Any help and hints are greatly appreciated! Thanks Robert -- My GPG Key ID: 336E2680