[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344291#comment-16344291
 ] 

ASF GitHub Bot commented on DRILL-6032:
---------------------------------------

Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r164549701
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
    @@ -255,7 +254,6 @@ private HashAggregator createAggregatorInternal() 
throws SchemaChangeException,
           groupByOutFieldIds[i] = container.add(vv);
         }
     
    -    int extraNonNullColumns = 0; // each of SUM, MAX and MIN gets an extra 
bigint column
    --- End diff --
    
    Why was this removed ?  Unfortunately the code generator still generates an 
internal Bigint column to keep the "not null" status of the Aggregated Values, 
in the cases of sum/max/min. Below is an example of a generated code (returning 
a nullable varchar) - note the nonNullCount (see DRILL-5728):
    
    ```
            public void outputRecordValues(int htRowIdx, int outRowIdx)
                throws SchemaChangeException
            {
                {
                    NullableVarCharHolder out17;
                    {
                        final NullableVarCharHolder out = new 
NullableVarCharHolder();
                        vv1 .getAccessor().get((htRowIdx), work0);
                        ObjectHolder value = work0;
                        work4 .value = vv5 .getAccessor().get((htRowIdx));
                        UInt1Holder init = work4;
                        work8 .value = vv9 .getAccessor().get((htRowIdx));
                        BigIntHolder nonNullCount = work8;
                        DrillBuf buf = work12;
                         
    MaxVarBytesFunctions$NullableVarCharMax_output: {
        if (nonNullCount.value > 0) {
            out.isSet = 1;
    
            org.apache.drill.exec.expr.fn.impl.DrillByteArray tmp = 
(org.apache.drill.exec.expr.fn.impl.DrillByteArray) value.obj;
    
            buf = buf.reallocIfNeeded(tmp.getLength());
            buf.setBytes(0, tmp.getBytes(), 0, tmp.getLength());
            out.start = 0;
            out.end = tmp.getLength();
            out.buffer = buf;
        } else
        {
            out.isSet = 0;
        }
    }
     
                        work0 = value;
                        vv1 .getMutator().setSafe((htRowIdx), work0);
                        work4 = init;
                        vv5 .getMutator().set((htRowIdx), work4 .value);
                        work8 = nonNullCount;
                        vv9 .getMutator().set((htRowIdx), work8 .value);
                        work12 = buf;
                        out17 = out;
                    }
                    if (!(out17 .isSet == 0)) {
                        vv18 .getMutator().setSafe((outRowIdx), out17 .isSet, 
out17 .start, out17 .end, out17 .buffer);
                    }
                }
            }
    ```
     


> Use RecordBatchSizer to estimate size of columns in HashAgg
> -----------------------------------------------------------
>
>                 Key: DRILL-6032
>                 URL: https://issues.apache.org/jira/browse/DRILL-6032
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to