GitHub user sachouche opened a pull request:
https://github.com/apache/drill/pull/1087
Attempt to fix memory leak in Parquet
** Problem Description **
This is an extremely rare leak which I was able to emulate by putting a
sleep in the AsyncPageReader right after reading the page and before enqueue in
the result queue. This is how this issue could manifest itself in real life
scenario:
- AsyncPageReader reads a page into a buffer but didn't enqueue yet the
result (thread got preempted)
- Parquet Scan thread blocked waiting on the task (Future object dequeued)
- Cancel received and Scan thread interrupted
- Future.get() returns (Future object is lost)
- Scan thread executes release logic
- Scan thread is not able to interrupt the AsyncPageReader thread since the
future object is lost
- AsyncPageReader thread resumes and enqueues the DrillBuf in the result
queue
- This results in a leak since this buffer is not properly released
** Fix Description **
- The fix is straightforward as we peek the Future object during the
blocking get() method
- This way, an exception (such as an interrupt) will leave the Future
object in the task queue
- The cleanup logic will be able to guarantee the DrillBuf object is either
GCed by the AsyncPageReader or ParquetScan thread
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sachouche/drill DRILL-6079
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1087.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1087
----
commit 52030d1d9cc3b8992a10ade8c7126d66e785043a
Author: Salim Achouche <sachouche2@...>
Date: 2017-12-22T19:50:56Z
Attempt to fix memory leak in Parquet
----
---