Re: [Spark] java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-30 Thread Yifan LI
Thanks Deng, Yes, I agree that there is a partition larger than 2GB which caused this exception. But actually in my case it seems to be not-helpful to fix this problem by directly increasing partitioning in sortBy operation. I think the partitioning in sortBy is not balanced, e.g. in my

[Spark] java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-29 Thread Yifan LI
Hey, I was just trying to scan a large RDD sortedRdd, ~1billion elements, using toLocalIterator api, but an exception returned as it was almost finished: java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at

Re: [Spark] java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-29 Thread Yifan LI
I have a guess that before scanning that RDD, I sorted it and set partitioning, so the result is not balanced: sortBy[S](f: Function [T, S], ascending: Boolean, numPartitions: Int) I will try to

Re: [Spark] java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-29 Thread Deng Ching-Mallete
Hi Yifan, This is a known issue, please refer to https://issues.apache.org/jira/browse/SPARK-6235 for more details. In your case, it looks like you are caching to disk a partition > 2G. A workaround would be to increase the number of your RDD partitions in order to make them smaller in size.