GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/21738
[SPARK-21743][SQL][followup] free aggregate map when task ends ## What changes were proposed in this pull request? This is the first follow-up of https://github.com/apache/spark/pull/21573 , which was only merged to 2.3. This PR fixes the memory leak in another way: free the `UnsafeExternalMap` when the task ends. All the data buffers in Spark SQL are using `UnsafeExternalMap` and `UnsafeExternalSorter` under the hood, e.g. sort, aggregate, window, SMJ, etc. `UnsafeExternalSorter` registers a task completion listener to free the resource, we should apply the same thing to `UnsafeExternalMap`. TODO in the next PR: do not consume all the inputs when having limit in whole stage codegen. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark limit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21738.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21738 ---- commit 174c4e55b897beaf51e395376bbb3d651d394d94 Author: Wenchen Fan <wenchen@...> Date: 2018-07-09T16:18:31Z free aggregate map when task ends ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org