[ https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344291#comment-16344291 ]
ASF GitHub Bot commented on DRILL-6032: --------------------------------------- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r164549701 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java --- @@ -255,7 +254,6 @@ private HashAggregator createAggregatorInternal() throws SchemaChangeException, groupByOutFieldIds[i] = container.add(vv); } - int extraNonNullColumns = 0; // each of SUM, MAX and MIN gets an extra bigint column --- End diff -- Why was this removed ? Unfortunately the code generator still generates an internal Bigint column to keep the "not null" status of the Aggregated Values, in the cases of sum/max/min. Below is an example of a generated code (returning a nullable varchar) - note the nonNullCount (see DRILL-5728): ``` public void outputRecordValues(int htRowIdx, int outRowIdx) throws SchemaChangeException { { NullableVarCharHolder out17; { final NullableVarCharHolder out = new NullableVarCharHolder(); vv1 .getAccessor().get((htRowIdx), work0); ObjectHolder value = work0; work4 .value = vv5 .getAccessor().get((htRowIdx)); UInt1Holder init = work4; work8 .value = vv9 .getAccessor().get((htRowIdx)); BigIntHolder nonNullCount = work8; DrillBuf buf = work12; MaxVarBytesFunctions$NullableVarCharMax_output: { if (nonNullCount.value > 0) { out.isSet = 1; org.apache.drill.exec.expr.fn.impl.DrillByteArray tmp = (org.apache.drill.exec.expr.fn.impl.DrillByteArray) value.obj; buf = buf.reallocIfNeeded(tmp.getLength()); buf.setBytes(0, tmp.getBytes(), 0, tmp.getLength()); out.start = 0; out.end = tmp.getLength(); out.buffer = buf; } else { out.isSet = 0; } } work0 = value; vv1 .getMutator().setSafe((htRowIdx), work0); work4 = init; vv5 .getMutator().set((htRowIdx), work4 .value); work8 = nonNullCount; vv9 .getMutator().set((htRowIdx), work8 .value); work12 = buf; out17 = out; } if (!(out17 .isSet == 0)) { vv18 .getMutator().setSafe((outRowIdx), out17 .isSet, out17 .start, out17 .end, out17 .buffer); } } } ``` > Use RecordBatchSizer to estimate size of columns in HashAgg > ----------------------------------------------------------- > > Key: DRILL-6032 > URL: https://issues.apache.org/jira/browse/DRILL-6032 > Project: Apache Drill > Issue Type: Improvement > Reporter: Timothy Farkas > Assignee: Timothy Farkas > Priority: Major > Fix For: 1.13.0 > > > We need to use the RecordBatchSize to estimate the size of columns in the > Partition batches created by HashAgg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)