[ https://issues.apache.org/jira/browse/DRILL-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945859#comment-15945859 ]
Paul Rogers commented on DRILL-5027: ------------------------------------ Primarily a development issue; hard to test at the QA level. > ExternalSortBatch is inefficient: rewrites data unnecessarily > ------------------------------------------------------------- > > Key: DRILL-5027 > URL: https://issues.apache.org/jira/browse/DRILL-5027 > Project: Apache Drill > Issue Type: Sub-task > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > Fix For: 1.11.0 > > > The {{ExternalSortBatch}} (ESB) operator sorts data while spilling to disk as > needed to operate within a memory budget. > The sort happens in two phases: > 1. Gather the incoming batches from the upstream operator, sort them, and > spill to disk as needed. > 2. Merge the "runs" spilled in step 1. > In most cases, the second step should run within the memory available for the > first step (which is why severity is only Minor). Large queries need multiple > sort "phases" in which previously spilled runs are read back into memory, > merged, and again spilled. It is here that ESB has an issue. This process > correctly limit the amount of memory used, but at the cost or rewriting the > same data over and over. > Consider current Drill behavior: > {code} > a b c d (re-spill) > abcd e f g h (re-spill) > abcefgh i j k > {code} > That is batches, a, b, c and d are re-spilled to create the combined abcd, > and so on. The same data is rewritten over and over. > Note that spilled batches take no (direct) memory in Drill, and require only > a small on-heap memento. So, maintaining data on disk s "free". So, better > would be to re-spill only newer data: > {code} > a b c d (re-spill) > abcd | e f g h (re-spill) > abcd efgh | i j k > {code} > Where the bar indicates a moving point at which we've already merged and do > not need to do so again. If each letter is one unit of disk I/O, the original > method uses 35 units while the revised method uses 27 units. > At some point the process may have to repeat by merging the second-generation > spill files and so on. -- This message was sent by Atlassian JIRA (v6.3.15#6346)