[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517220#comment-16517220 ]
ASF GitHub Bot commented on DRILL-6310: --------------------------------------- ilooner commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r196468069 ########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java ########## @@ -1317,7 +1364,7 @@ private void checkGroupAndAggrValues(int incomingRowIdx) { useReservedValuesMemory(); // try to preempt an OOM by using the reserve - addBatchHolder(currentPartition); // allocate a new (internal) values batch + addBatchHolder(currentPartition, getBatchSize()); // allocate a new (internal) values batch Review comment: Padma I agree we want to limit the size of output batches, and that reducing the batch holder size is a great change. Having BatchHolders with always 64k rows is not practical. My issue is with changing the batch holder size dynamically. I think it adds complexity without a concrete benefit. Since new data will be added to old BatchHolders data will never really go into a BatchHolder that was appropriately sized for it. Since we can't really have an accurate solution by taking the complex approach with dynamically changing BatchHolder sizes, I think we should go with the simpler approach. We can still use all your changes, I just think we shouldn't continue updating the BatchHolder size. The complexity added is in the added overhead for computing indexes in the hashtable, and there will be more complexity in doing the refactored memory calculations I am adding. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > ----------------------------------- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow > Affects Versions: 1.13.0 > Reporter: Padma Penumarthy > Assignee: Padma Penumarthy > Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)