[ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517107#comment-16517107
 ] 

ASF GitHub Bot commented on DRILL-6310:
---------------------------------------

ppadma commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r196434655
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ##########
 @@ -1317,7 +1364,7 @@ private void checkGroupAndAggrValues(int incomingRowIdx) 
{
 
         useReservedValuesMemory(); // try to preempt an OOM by using the 
reserve
 
-        addBatchHolder(currentPartition);  // allocate a new (internal) values 
batch
+        addBatchHolder(currentPartition, getBatchSize());  // allocate a new 
(internal) values batch
 
 Review comment:
   Adjusting batch holder size here means adjusting number of rows in the 
batch, based on average row width. Idea is to limit size in terms of memory, 
not in terms of number of rows. Batches are limited to 16MB (or whatever 
configured output batch size).  By allocating huge batches and partially 
transmitting them, we might be able to limit output batch size, but that does 
not produce much benefit. We want to avoid huge memory allocations.
   Why we should not change batch holder size ? 
   If we size just based on first batch, it creates the exact problem you 
mentioned i.e. they will be sized based on older input data and they may not 
make much sense for new data.
   What I have is not exact perfect solution. In fact, I don't even know if 
such a solution is possible or exists. This will work fine with law of averages.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> limit batch size for hash aggregate
> -----------------------------------
>
>                 Key: DRILL-6310
>                 URL: https://issues.apache.org/jira/browse/DRILL-6310
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.13.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to