Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/914
  
    This commit introduces a feature to limit memory consumed by a batch.
    
    ### Batch Size Limits
    
    With this change, the code now has three overlapping limits:
    
    * The traditional row-count limit.
    * A maximum limit of 16 MB per vector.
    * The new memory-per-batch limit.
    
    ### Overall Flow for Limiting Batch Memory Usage
    
    The batch size limit builds on the work already done for overflow.
    
    * The column metadata allows the client to specify allocation hints such as 
expected Varchar width and array cardinality.
    * The result set loader allocates a batch using the hints and target row 
count.
    * The result set loader measures the memory allocated above. This is the 
initial batch size.
    * As the writers find the need to extend a vector, the writer calls a 
listener to ask if the extension is allowed, passing in the amount of growth 
expected.
    * The result set loader adds the delta to the accumulated total, compares 
this against the size limit, and returns whether the resize is allowed.
    * If the resize is not allowed, an overflow is triggered.
    
    Note that the above reuses the overflow mechanism, allowing the size limit 
to be handled even if reached in the middle of a row.
    
    ### Implementation Details
    
    To make the above work:
    
    * A new batch size limit is added to the result set loader options.
    * The batch size tracking code is added. This required a new method in the 
value vectors to report actual allocated memory.
    * The scalar accessors are refactored to add in the batch size limitation 
without introducing duplicated code. Code moved from the template to base 
classes to factor out redundancy.
    * General code clean-up in the vector limit found while doing the above 
work.
    * Unit tests for the new mechanism.


---

Reply via email to