[ https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-21140. ------------------------------- Resolution: Invalid There's no real detail here. Executor memory doesn't directly matter to how much data you can collect on the driver. Of course, collecting half-gig partitions to a driver is going to fail with even 1 partition, because that's about the default size of the driver memory. This should start as a question on a mailing list. > Reduce collect high memory requrements > -------------------------------------- > > Key: SPARK-21140 > URL: https://issues.apache.org/jira/browse/SPARK-21140 > Project: Spark > Issue Type: Improvement > Components: Input/Output > Affects Versions: 2.1.1 > Environment: Linux Debian 8 using hadoop 2.7.2. > Reporter: michael procopio > > I wrote a very simple Scala application which used flatMap to create an RDD > containing a 512 mb partition of 256 byte arrays. Experimentally, I > determined that spark.executor.memory had to be set at 3 gb in order to > colledt the data. This seems extremely high. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org