[ 
https://issues.apache.org/jira/browse/DRILL-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5019.
--------------------------------
    Resolution: Fixed

> ExternalSortBatch spills all batches to disk even if even one spills
> --------------------------------------------------------------------
>
>                 Key: DRILL-5019
>                 URL: https://issues.apache.org/jira/browse/DRILL-5019
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> The ExternalSortBatch (ESB) operator sorts batches while spilling to disk to 
> stay within a defined memory budget.
> Assume the memory budget is 10 GB. Assume that the actual volume of data to 
> be sorted is 10.1 GB. The ESB spills the extra 0.1 GB to disk. (Actually 
> spills more than that, say 5 GB.)
> At the completion of the run, ESB has read all incoming batches. It must now 
> merge those batches. It does so by spilling **all** batches to disk, then 
> doing a disk-based merge.
> This means that exceeding the memory limit by even a small amount is the same 
> as having a very low memory limit: all batches must spill.
> This solution is simple, it works, and has some amount of logic.
> But, it would be better to have a slightly more advanced solution that spills 
> only the smallest possible set of batches to disk, then does a hybrid 
> in-memory, on-disk merge, saving the unnecessary write/read cycle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to