michael procopio created SPARK-21140: ----------------------------------------
Summary: Reduce collect high memory requrements Key: SPARK-21140 URL: https://issues.apache.org/jira/browse/SPARK-21140 Project: Spark Issue Type: Improvement Components: Input/Output Affects Versions: 2.1.1 Environment: Linux Debian 8 using hadoop 2.7.2. Reporter: michael procopio I wrote a very simple Scala application which used flatMap to create an RDD containing a 512 mb partition of 256 byte arrays. Experimentally, I determined that spark.executor.memory had to be set at 3 gb in order to colledt the data. This seems extremely high. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org