Mridul Muralidharan created SPARK-6165:
------------------------------------------

             Summary: Aggregate and reduce should spool to disk and complete
                 Key: SPARK-6165
                 URL: https://issues.apache.org/jira/browse/SPARK-6165
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.4.0
            Reporter: Mridul Muralidharan
            Priority: Minor



To prevent data from workers causing OOM at master, we have the property 
'spark.driver.maxResultSize'.

But the OOM at master can be due to two reasons :

a) Data being sent from workers is too large - causing OOM at master.
b) Large number of moderate (to low) sized data being sent to master causing 
OOM.
(For example: 500k tasks, 1k each)

spark.driver.maxResultSize protects against both - but (b) should be handled 
more gracefully by master : example spool it to disk, aggregate without waiting 
for entire result set to be fetched, etc.

Currently we are forced to use treeReduce and co to work around this problem : 
adding to the latency of jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to