Re: StreamCorruptedException during deserialization

2016-03-29 Thread Robert Schmidtke
All the Jars and Java versions are consistent in my setup. In fact, I have
Spark sorting 1TB of data using the exact same setup, except with another
file system as storage for the data nodes. Could it be that there is actual
corruption in the files written?

On Tue, Mar 29, 2016 at 12:00 PM, Simon Hafner 
wrote:

> 2016-03-29 11:25 GMT+02:00 Robert Schmidtke :
> > Is there a meaningful way for me to find out what exactly is going wrong
> > here? Any help and hints are greatly appreciated!
> Maybe a version mismatch between the jars on the cluster?
>



-- 
My GPG Key ID: 336E2680


Re: StreamCorruptedException during deserialization

2016-03-29 Thread Simon Hafner
2016-03-29 11:25 GMT+02:00 Robert Schmidtke :
> Is there a meaningful way for me to find out what exactly is going wrong
> here? Any help and hints are greatly appreciated!
Maybe a version mismatch between the jars on the cluster?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



StreamCorruptedException during deserialization

2016-03-29 Thread Robert Schmidtke
Hi everyone,

I'm running the Intel HiBench TeraSort (1TB) Spark Scala benchmark on Spark
1.6.0. After some time, I'm seeing one task fail too many times, despite
being rescheduled on different nodes with the following stacktrace:

16/03/27 22:25:04 WARN scheduler.TaskSetManager: Lost task 97.0 in stage
2.0 (TID 2337, ): java.io.StreamCorruptedException: invalid stream
header: 32313530
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
at java.io.ObjectInputStream.(ObjectInputStream.java:299)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.(JavaSerializer.scala:64)
at
org.apache.spark.serializer.JavaDeserializationStream.(JavaSerializer.scala:64)
at
org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:123)
at
org.apache.spark.shuffle.BlockStoreShuffleReader$$anonfun$3.apply(BlockStoreShuffleReader.scala:64)
at
org.apache.spark.shuffle.BlockStoreShuffleReader$$anonfun$3.apply(BlockStoreShuffleReader.scala:60)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197)
at
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:103)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Is there a meaningful way for me to find out what exactly is going wrong
here? Any help and hints are greatly appreciated!

Thanks
Robert

-- 
My GPG Key ID: 336E2680