[ https://issues.apache.org/jira/browse/SPARK-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077012#comment-14077012 ]
Davies Liu commented on SPARK-1343: ----------------------------------- https://github.com/apache/spark/pull/1460 https://github.com/apache/spark/pull/1568 > PySpark OOMs without caching > ---------------------------- > > Key: SPARK-1343 > URL: https://issues.apache.org/jira/browse/SPARK-1343 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 0.9.0 > Reporter: Matei Zaharia > Fix For: 0.9.0, 1.0.0 > > > There have been several reports on the list of PySpark 0.9 OOMing even if it > does simple maps and counts, whereas 0.9 didn't. This may be due to either > the batching added to serialization, or due to invalid serialized data which > makes the Java side allocate an overly large array. Needs investigating for > 1.0. -- This message was sent by Atlassian JIRA (v6.2#6252)