[ https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
michael procopio reopened SPARK-21140: -------------------------------------- I am not sure what detail you are looking for. I provided the test code I was using. Seems to me multiple copies of the data must be generated when collecting a partition. Having to set driver.executor.memory to 3gb to collect a partition of 512 mb seems high to me. > Reduce collect high memory requrements > -------------------------------------- > > Key: SPARK-21140 > URL: https://issues.apache.org/jira/browse/SPARK-21140 > Project: Spark > Issue Type: Improvement > Components: Input/Output > Affects Versions: 2.1.1 > Environment: Linux Debian 8 using hadoop 2.7.2. > Reporter: michael procopio > > I wrote a very simple Scala application which used flatMap to create an RDD > containing a 512 mb partition of 256 byte arrays. Experimentally, I > determined that spark.executor.memory had to be set at 3 gb in order to > colledt the data. This seems extremely high. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org