[ https://issues.apache.org/jira/browse/SPARK-27041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen reassigned SPARK-27041: --------------------------------- Assignee: David Yang > large partition data cause pyspark with python2.x oom > ----------------------------------------------------- > > Key: SPARK-27041 > URL: https://issues.apache.org/jira/browse/SPARK-27041 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.0 > Reporter: David Yang > Assignee: David Yang > Priority: Major > > With large partition, pyspark may exceeds executor memory limit and trigger > out of memory for python 2.7. > This is because map() is used. Unlike in python3.x, python 2.7 map() will > generate a list and need to read all data into memory. > The proposed fix will use imap in python 2.7 and it has been verified. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org