[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517107#comment-16517107 ]
ASF GitHub Bot commented on DRILL-6310: --------------------------------------- ppadma commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r196434655 ########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java ########## @@ -1317,7 +1364,7 @@ private void checkGroupAndAggrValues(int incomingRowIdx) { useReservedValuesMemory(); // try to preempt an OOM by using the reserve - addBatchHolder(currentPartition); // allocate a new (internal) values batch + addBatchHolder(currentPartition, getBatchSize()); // allocate a new (internal) values batch Review comment: Adjusting batch holder size here means adjusting number of rows in the batch, based on average row width. Idea is to limit size in terms of memory, not in terms of number of rows. Batches are limited to 16MB (or whatever configured output batch size). By allocating huge batches and partially transmitting them, we might be able to limit output batch size, but that does not produce much benefit. We want to avoid huge memory allocations. Why we should not change batch holder size ? If we size just based on first batch, it creates the exact problem you mentioned i.e. they will be sized based on older input data and they may not make much sense for new data. What I have is not exact perfect solution. In fact, I don't even know if such a solution is possible or exists. This will work fine with law of averages. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > ----------------------------------- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow > Affects Versions: 1.13.0 > Reporter: Padma Penumarthy > Assignee: Padma Penumarthy > Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)