[ 
https://issues.apache.org/jira/browse/DRILL-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5022:
-------------------------------
    Fix Version/s:     (was: 1.10.0)
                   1.11.0

> ExternalSortBatch sets two different limits for "copier" memory
> ---------------------------------------------------------------
>
>                 Key: DRILL-5022
>                 URL: https://issues.apache.org/jira/browse/DRILL-5022
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.11.0
>
>
> The {{ExternalSortBatch}} (ESB) operator sorts rows and supports spilling to 
> disk to operate within a set memory budget.
> A key step in disk-based sorting is to merge "runs" of previously-sorted 
> records. ESB does this with a class created from the 
> {{PriorityQueueCopierTemplate}}, called the "copier" in the code.
> The sort runs are represented by record batches, each with an indirection 
> vector (AKA {{SelectionVector}}) that point to the records in sort order.
> The copier restructures the incoming runs: copying from the original batches 
> (from positions given by the indirection vector) into new output vectors in 
> sorted order. To do this work, the copier must allocate new vectors to hold 
> the merged data. These vectors consume memory, and must fit into the overall 
> memory budget assigned to the ESB.
> As it turns out, the ESB code has two conflicting ways of setting the limit. 
> One is hard-coded:
> {code}
>   private static final int COPIER_BATCH_MEM_LIMIT = 256 * 1024;
> {code}
> The other comes from config parameters:
> {code}
>   public static final long INITIAL_ALLOCATION = 10_000_000;
>   public static final long MAX_ALLOCATION = 20_000_000;
>     copierAllocator = oAllocator.newChildAllocator(oAllocator.getName() + 
> ":copier",
>         PriorityQueueCopier.INITIAL_ALLOCATION, 
> PriorityQueueCopier.MAX_ALLOCATION);
> {code}
> Strangely, the config parameters are used to set aside memory for the copier 
> to use. But, the {{COPIER_BATCH_MEM_LIMIT}} is used to determine how large of 
> a merged batch to actually create.
> The result is that we set aside 10 MB of memory, but use only 256K of it, 
> wasting 9 MB.
> This ticket asks to:
> * Determine the proper merged batch size.
> * Use that limit to set the memory allocation for the copier.
> Elsewhere in Drill batch sizes tend to be on the order of 32K records. In the 
> ESB, the low {{COPIER_BATCH_MEM_LIMIT}} tends to favor smaller batches: A 
> test case has a row width of 114 bytes, and produces batches of just 2299 
> records. So, likely the proper choice is the larger 10 MB memory allocator 
> limit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to