[ 
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2773.
------------------------------------
    Resolution: Invalid

> Shuffle:use growth rate to predict if need to spill
> ---------------------------------------------------
>
>                 Key: SPARK-2773
>                 URL: https://issues.apache.org/jira/browse/SPARK-2773
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: uncleGen
>            Priority: Minor
>
> Right now, Spark uses the total usage of "shuffle" memory of each thread to 
> predict if need to spill. I think it is not very reasonable. For example, 
> there are two threads pulling "shuffle" data. The total memory used to buffer 
> data is 21G. The first time to trigger spilling it when one thread has used 
> 7G memory to buffer "shuffle" data, here I assume another one has used the 
> same size. Unfortunately, I still have remaining 7G to use. So, I think 
> current prediction mode is too conservative, and can not maximize the usage 
> of "shuffle" memory. In my solution, I use the growth rate of "shuffle" 
> memory. Again, the growth of each time is limited, maybe 10K * 1024(my 
> assumption), then the first time to trigger spilling is when the remaining 
> "shuffle" memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think 
> it can maximize the usage of "shuffle" memory. In my solution, there is also 
> a conservative assumption, i.e. all of threads is pulling shuffle data in one 
> executor. However it dose not have much effect, the grow is limited after 
> all. Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to