GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21738

    [SPARK-21743][SQL][followup] free aggregate map when task ends

    ## What changes were proposed in this pull request?
    
    This is the first follow-up of https://github.com/apache/spark/pull/21573 , 
which was only merged to 2.3.
    
    This PR fixes the memory leak in another way: free the `UnsafeExternalMap` 
when the task ends. All the data buffers in Spark SQL are using 
`UnsafeExternalMap` and `UnsafeExternalSorter` under the hood, e.g. sort, 
aggregate, window, SMJ, etc. `UnsafeExternalSorter` registers a task completion 
listener to free the resource, we should apply the same thing to 
`UnsafeExternalMap`.
    
    TODO in the next PR:
    do not consume all the inputs when having limit in whole stage codegen.
    
    ## How was this patch tested?
    
    existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark limit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21738.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21738
    
----
commit 174c4e55b897beaf51e395376bbb3d651d394d94
Author: Wenchen Fan <wenchen@...>
Date:   2018-07-09T16:18:31Z

    free aggregate map when task ends

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to