subject:"\[jira\] \[Resolved\] \(SPARK\-2773\) Shuffle：use growth rate to predict if need to spill"

[jira] [Resolved] (SPARK-2773) Shuffle：use growth rate to predict if need to spill

2014-09-01 Thread Patrick Wendell (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Wendell resolved SPARK-2773.

Resolution: Invalid

Shuffle：use growth rate to predict if need to spill
---

Key: SPARK-2773
URL: https://issues.apache.org/jira/browse/SPARK-2773
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 0.9.0, 1.0.0
Reporter: uncleGen
Priority: Minor

Right now, Spark uses the total usage of shuffle memory of each thread to
predict if need to spill. I think it is not very reasonable. For example,
there are two threads pulling shuffle data. The total memory used to buffer
data is 21G. The first time to trigger spilling it when one thread has used
7G memory to buffer shuffle data, here I assume another one has used the
same size. Unfortunately, I still have remaining 7G to use. So, I think
current prediction mode is too conservative, and can not maximize the usage
of shuffle memory. In my solution, I use the growth rate of shuffle
memory. Again, the growth of each time is limited, maybe 10K * 1024(my
assumption), then the first time to trigger spilling is when the remaining
shuffle memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think
it can maximize the usage of shuffle memory. In my solution, there is also
a conservative assumption, i.e. all of threads is pulling shuffle data in one
executor. However it dose not have much effect, the grow is limited after
all. Any suggestion?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2773) Shuffle：use growth rate to predict if need to spill

2014-09-01 Thread Patrick Wendell (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Wendell resolved SPARK-2773.

Resolution: Won't Fix

I don't think this is needed now that SPARK-2316 is fixed. This queue is not
intended to overflow during normal operation. If you still observe issues in a
version of Spark that contains SPARK-2316... please report it and we'll see
what is going on.

Shuffle：use growth rate to predict if need to spill
---

Key: SPARK-2773
URL: https://issues.apache.org/jira/browse/SPARK-2773
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 0.9.0, 1.0.0
Reporter: uncleGen
Priority: Minor

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2773) Shuffle：use growth rate to predict if need to spill

[jira] [Resolved] (SPARK-2773) Shuffle：use growth rate to predict if need to spill

2 matches

Site Navigation

Mail list logo

Footer information