[ https://issues.apache.org/jira/browse/SPARK-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015907#comment-14015907 ]
Madhu Siddalingaiah edited comment on SPARK-983 at 6/2/14 9:18 PM: ------------------------------------------------------------------- Yes, the code is checked in here: [https://github.com/msiddalingaiah/spark] If you're happy with it, I can fill in the guts without affecting your work on [SPARK-1021|https://issues.apache.org/jira/browse/SPARK-1021]. BTW, I was able to get spark-core to build and run in Eclipse (Spark-IDE + Scala Test). There was a bit of fiddling, but it works quite well. was (Author: msiddalingaiah): Yes, the code is checked in here: [https://github.com/msiddalingaiah/spark] If you're happy with it, I can fill in the guts without affecting your work on [https://issues.apache.org/jira/browse/SPARK-1021|SPARK-1021]. BTW, I was able to get spark-core to build and run in Eclipse (Spark-IDE + Scala Test). There was a bit of fiddling, but it works quite well. > Support external sorting for RDD#sortByKey() > -------------------------------------------- > > Key: SPARK-983 > URL: https://issues.apache.org/jira/browse/SPARK-983 > Project: Spark > Issue Type: New Feature > Affects Versions: 0.9.0 > Reporter: Reynold Xin > Assignee: Madhu Siddalingaiah > > Currently, RDD#sortByKey() is implemented by a mapPartitions which creates a > buffer to hold the entire partition, then sorts it. This will cause an OOM if > an entire partition cannot fit in memory, which is especially problematic for > skewed data. Rather than OOMing, the behavior should be similar to the > [ExternalAppendOnlyMap|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala], > where we fallback to disk if we detect memory pressure. -- This message was sent by Atlassian JIRA (v6.2#6252)