[ 
https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052193#comment-16052193
 ] 

Jinfeng Ni commented on DRILL-5211:
-----------------------------------

My 2cents:

I could understand the cause of direct memory fragmentation. 

The proposal in ApacheDrillVectorLimits.pdf seems to be a reversal of changes 
in DRILL-1960.  You are right that value vector's setSafe() will do realloc() 
in case running out of drillbuf. The realloc() will double Drillbuf, which may 
eventually lead to memory fragment / OOM.

Before DRILL-1960, setSafe() actually returns boolean to indicate whether the 
method completes successfully or not (in stead of throwing an exception as in 
your proposed "setScalar" method). When it returns false, it's each operator's 
responsibility to  1) rewind / replay the overflow row,  2) break the ongoing 
batch into two, and 3) pass the full batch to downstream and continue work with 
the the overflow row plus rest incoming rows.  I guess that puts significant 
complexity to operators, and that's why DRILL-1960 was proposed to move the 
complexity to value vector from operator.  [~sphillips] would know better than 
me about the background.

Also, the proposal seems to focus on scan operator.  Many operators might be 
able to produce a value vector beyond allowed size. Project is one example. 
Exchange operator, selection vector remover, aggregator, etc, which has to 
evaluate an expression, or reshuffle/copy data, could run into such situations. 
 My guess it would require huge effort to modify all the impacted operators to 
enforce the "size-aware" vector writer policy. 


1. https://issues.apache.org/jira/browse/DRILL-1960

> Queries fail due to direct memory fragmentation
> -----------------------------------------------
>
>                 Key: DRILL-5211
>                 URL: https://issues.apache.org/jira/browse/DRILL-5211
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.9.0
>
>         Attachments: ApacheDrillMemoryFragmentationBackground.pdf, 
> ApacheDrillVectorSizeLimits.pdf, EnhancedScanOperator.pdf, 
> ScanSchemaManagement.pdf
>
>
> Consider a test of the external sort as follows:
> * Direct memory: 3GB
> * Input file: 18 GB, with one Varchar column of 8K width
> The sort runs, spilling to disk. Once all data arrives, the sort beings to 
> merge the results. But, to do that, it must first do an intermediate merge. 
> For example, in this sort, there are 190 spill files, but only 19 can be 
> merged at a time. (Each merge file contains 128 MB batches, and only 19 can 
> fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit.
> Yet, when loading batch xx, Drill fails with an OOM error. At that point, 
> total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}} 
> in the {{Bits}} class in the JDK.)
> It appears that Drill wants to allocate 58,257,868 bytes, but the 
> {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an 
> OOM.
> The problem is that, at this point, the external sort should not ask the 
> system for more memory. The allocator for the external sort is at just 
> 1,192,350,366 before the allocation request. Plenty of spare memory should be 
> available, released when the in-memory batches were spilled to disk prior to 
> merging. Indeed, earlier in the run, the sort had reached a peak memory usage 
> of 2,710,716,416 bytes. This memory should be available for reuse during 
> merging, and is plenty sufficient to fill the particular request in question.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to