[jira] [Resolved] (SPARK-2773) Shuffle:use growth rate to predict if need to spill

2014-09-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2773.

Resolution: Invalid

 Shuffle:use growth rate to predict if need to spill
 ---

 Key: SPARK-2773
 URL: https://issues.apache.org/jira/browse/SPARK-2773
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 0.9.0, 1.0.0
Reporter: uncleGen
Priority: Minor

 Right now, Spark uses the total usage of shuffle memory of each thread to 
 predict if need to spill. I think it is not very reasonable. For example, 
 there are two threads pulling shuffle data. The total memory used to buffer 
 data is 21G. The first time to trigger spilling it when one thread has used 
 7G memory to buffer shuffle data, here I assume another one has used the 
 same size. Unfortunately, I still have remaining 7G to use. So, I think 
 current prediction mode is too conservative, and can not maximize the usage 
 of shuffle memory. In my solution, I use the growth rate of shuffle 
 memory. Again, the growth of each time is limited, maybe 10K * 1024(my 
 assumption), then the first time to trigger spilling is when the remaining 
 shuffle memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think 
 it can maximize the usage of shuffle memory. In my solution, there is also 
 a conservative assumption, i.e. all of threads is pulling shuffle data in one 
 executor. However it dose not have much effect, the grow is limited after 
 all. Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2773) Shuffle:use growth rate to predict if need to spill

2014-09-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2773.

Resolution: Won't Fix

I don't think this is needed now that SPARK-2316 is fixed. This queue is not 
intended to overflow during normal operation. If you still observe issues in a 
version of Spark that contains SPARK-2316... please report it and we'll see 
what is going on.

 Shuffle:use growth rate to predict if need to spill
 ---

 Key: SPARK-2773
 URL: https://issues.apache.org/jira/browse/SPARK-2773
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 0.9.0, 1.0.0
Reporter: uncleGen
Priority: Minor

 Right now, Spark uses the total usage of shuffle memory of each thread to 
 predict if need to spill. I think it is not very reasonable. For example, 
 there are two threads pulling shuffle data. The total memory used to buffer 
 data is 21G. The first time to trigger spilling it when one thread has used 
 7G memory to buffer shuffle data, here I assume another one has used the 
 same size. Unfortunately, I still have remaining 7G to use. So, I think 
 current prediction mode is too conservative, and can not maximize the usage 
 of shuffle memory. In my solution, I use the growth rate of shuffle 
 memory. Again, the growth of each time is limited, maybe 10K * 1024(my 
 assumption), then the first time to trigger spilling is when the remaining 
 shuffle memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think 
 it can maximize the usage of shuffle memory. In my solution, there is also 
 a conservative assumption, i.e. all of threads is pulling shuffle data in one 
 executor. However it dose not have much effect, the grow is limited after 
 all. Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org