GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/16603

    [SPARK-19244][Core] Sort MemoryConsumers according to their memory usage 
when spilling

    ## What changes were proposed in this pull request?
    
    In `TaskMemoryManager `, when we acquire memory by calling 
`acquireExecutionMemory` and we can't acquire required memory, we will try to 
spill other memory consumers.
    
    Currently, we simply iterates the memory consumers in a hash set. Normally 
each time the consumer will be iterated in the same order.
    
    The first issue is that we might spill additional consumers. For example, 
if consumer 1 uses 10MB, consumer 2 uses 50MB, then consumer 3 acquires 100MB 
but we can only get 60MB and spilling is needed. We might spill both consumer 1 
and consumer 2. But we actually just need to spill consumer 2 and get the 
required 100MB.
    
    The second issue is that if we spill consumer 1 in first time spilling. 
After a while, consumer 1 now uses 5MB. Then consumer 4 may acquire some memory 
and spilling is needed again. Because we iterate the memory consumers in the 
same order, we will spill consumer 1 again. So for consumer 1, we will produce 
many small spilling files.
    
    This patch modifies the way iterating the memory consumers. It sorts the 
memory consumers by their memory usage. So the consumer using more memory will 
spill first. Once it is spilled, even it acquires few memory again, in next 
time spilling happens it will not be the consumers to spill again if there are 
other consumers using more memory than it.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 sort-memoryconsumer-when-spill

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16603.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16603
    
----
commit 4c2b7b02e809614993d25b21aee3e1d55355e482
Author: Liang-Chi Hsieh <vii...@gmail.com>
Date:   2017-01-16T08:57:57Z

    Sort MemoryConsumers according to their memory usage when spilling.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to