[jira] [Resolved] (SPARK-2773) Shuffle:use growth rate to predict if need to spill
[ https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2773. Resolution: Invalid Shuffle:use growth rate to predict if need to spill --- Key: SPARK-2773 URL: https://issues.apache.org/jira/browse/SPARK-2773 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 0.9.0, 1.0.0 Reporter: uncleGen Priority: Minor Right now, Spark uses the total usage of shuffle memory of each thread to predict if need to spill. I think it is not very reasonable. For example, there are two threads pulling shuffle data. The total memory used to buffer data is 21G. The first time to trigger spilling it when one thread has used 7G memory to buffer shuffle data, here I assume another one has used the same size. Unfortunately, I still have remaining 7G to use. So, I think current prediction mode is too conservative, and can not maximize the usage of shuffle memory. In my solution, I use the growth rate of shuffle memory. Again, the growth of each time is limited, maybe 10K * 1024(my assumption), then the first time to trigger spilling is when the remaining shuffle memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think it can maximize the usage of shuffle memory. In my solution, there is also a conservative assumption, i.e. all of threads is pulling shuffle data in one executor. However it dose not have much effect, the grow is limited after all. Any suggestion? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2773) Shuffle:use growth rate to predict if need to spill
[ https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2773. Resolution: Won't Fix I don't think this is needed now that SPARK-2316 is fixed. This queue is not intended to overflow during normal operation. If you still observe issues in a version of Spark that contains SPARK-2316... please report it and we'll see what is going on. Shuffle:use growth rate to predict if need to spill --- Key: SPARK-2773 URL: https://issues.apache.org/jira/browse/SPARK-2773 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 0.9.0, 1.0.0 Reporter: uncleGen Priority: Minor Right now, Spark uses the total usage of shuffle memory of each thread to predict if need to spill. I think it is not very reasonable. For example, there are two threads pulling shuffle data. The total memory used to buffer data is 21G. The first time to trigger spilling it when one thread has used 7G memory to buffer shuffle data, here I assume another one has used the same size. Unfortunately, I still have remaining 7G to use. So, I think current prediction mode is too conservative, and can not maximize the usage of shuffle memory. In my solution, I use the growth rate of shuffle memory. Again, the growth of each time is limited, maybe 10K * 1024(my assumption), then the first time to trigger spilling is when the remaining shuffle memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think it can maximize the usage of shuffle memory. In my solution, there is also a conservative assumption, i.e. all of threads is pulling shuffle data in one executor. However it dose not have much effect, the grow is limited after all. Any suggestion? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org