Thanks, Guilaume,
Below is when the exception happens, nothing has spilled to disk yet.
And there isn't a join, but a partitionBy and groupBy action.
Actually if numPartitions is small, it succeeds, while if it's large, it
fails.
Partition was simply done by
override def getPartition(key:
The exception drives me crazy, because it occurs randomly.
I didn't know which line of my code causes this exception.
I didn't even understand what KryoException:
java.lang.NegativeArraySizeException means, or even implies?
14/10/20 15:59:01 WARN scheduler.TaskSetManager: Lost task 32.2 in stage
Thank you, Guillaume, my dataset is not that large, it's totally ~2GB
2014-10-20 16:58 GMT+08:00 Guillaume Pitel guillaume.pi...@exensa.com:
Hi,
It happened to me with blocks which take more than 1 or 2 GB once
serialized
I think the problem was that during serialization, a Byte Array is
Well, reading your logs, here is what happens :
You do a combineByKey (so you have a join probably somewhere), which
spills on disk because it's too big. To spill on disk it serializes, and
the blocks are 2GB.
From a 2GB dataset, it's easy to exand to several TB
Increase parallelism, make