[ https://issues.apache.org/jira/browse/SPARK-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008907#comment-14008907 ]
Madhu Siddalingaiah commented on SPARK-983: ------------------------------------------- I had similar concerns that RunTime.freeMemory() would not be reliable. What about using weak references? We could spill to disk in the finalize() method of a weak reference. >From my reading of >[finalize()|http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#finalize%28%29], > it is acceptable to perform I/O operations (with sufficient rigor). The code >might be a bit tricky as there is no guarantee on the thread that calls >finalize(), but I think it could be done. > Support external sorting for RDD#sortByKey() > -------------------------------------------- > > Key: SPARK-983 > URL: https://issues.apache.org/jira/browse/SPARK-983 > Project: Spark > Issue Type: New Feature > Affects Versions: 0.9.0 > Reporter: Reynold Xin > > Currently, RDD#sortByKey() is implemented by a mapPartitions which creates a > buffer to hold the entire partition, then sorts it. This will cause an OOM if > an entire partition cannot fit in memory, which is especially problematic for > skewed data. Rather than OOMing, the behavior should be similar to the > [ExternalAppendOnlyMap|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala], > where we fallback to disk if we detect memory pressure. -- This message was sent by Atlassian JIRA (v6.2#6252)