[ 
https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael procopio reopened SPARK-21140:
--------------------------------------

I am not sure what detail you are looking for.  I provided the test code I was 
using.  Seems to me multiple copies of the data must be generated when 
collecting a partition.  Having to set driver.executor.memory to 3gb to collect 
a partition of 512 mb seems high to me.


> Reduce collect high memory requrements
> --------------------------------------
>
>                 Key: SPARK-21140
>                 URL: https://issues.apache.org/jira/browse/SPARK-21140
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 2.1.1
>         Environment: Linux Debian 8 using hadoop 2.7.2.
>            Reporter: michael procopio
>
> I wrote a very simple Scala application which used flatMap to create an RDD 
> containing a 512 mb partition of 256 byte arrays.  Experimentally, I 
> determined that spark.executor.memory had to be set at 3 gb in order to 
> colledt the data.  This seems extremely high.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to