[ https://issues.apache.org/jira/browse/DRILL-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945868#comment-15945868 ]
Paul Rogers commented on DRILL-5019: ------------------------------------ Primarily a development issue; hard to test at the QA level. > ExternalSortBatch spills all batches to disk even if even one spills > -------------------------------------------------------------------- > > Key: DRILL-5019 > URL: https://issues.apache.org/jira/browse/DRILL-5019 > Project: Apache Drill > Issue Type: Sub-task > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > Fix For: 1.11.0 > > > The ExternalSortBatch (ESB) operator sorts batches while spilling to disk to > stay within a defined memory budget. > Assume the memory budget is 10 GB. Assume that the actual volume of data to > be sorted is 10.1 GB. The ESB spills the extra 0.1 GB to disk. (Actually > spills more than that, say 5 GB.) > At the completion of the run, ESB has read all incoming batches. It must now > merge those batches. It does so by spilling **all** batches to disk, then > doing a disk-based merge. > This means that exceeding the memory limit by even a small amount is the same > as having a very low memory limit: all batches must spill. > This solution is simple, it works, and has some amount of logic. > But, it would be better to have a slightly more advanced solution that spills > only the smallest possible set of batches to disk, then does a hybrid > in-memory, on-disk merge, saving the unnecessary write/read cycle. -- This message was sent by Atlassian JIRA (v6.3.15#6346)