[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch

Boaz Ben-Zvi (JIRA) Thu, 17 Aug 2017 16:13:38 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Boaz Ben-Zvi updated DRILL-5728:
--------------------------------
    Description: 
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

bq. public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
bq.             throws SchemaChangeException
bq.         {
bq.             {
bq.                 IntHolder out11 = new IntHolder();
bq.                 {
bq.                     out11 .value = vv8 .getAccessor().get((incomingRowIdx));
bq.                 }
bq.                 IntHolder in = out11;
bq.                 work0 .value = vv1 .getAccessor().get((htRowIdx));
bq.                 BigIntHolder value = work0;
bq.                 work4 .value = vv5 .getAccessor().get((htRowIdx));
bq.                 BigIntHolder nonNullCount = work4;
bq.                  
bq. SumFunctions$IntSum_add: {
bq.     nonNullCount.value = 1;
bq.     value.value += in.value;
bq. }
bq.  
bq.                 work0 = value;
bq.                 vv1 .getMutator().set((htRowIdx), work0 .value);
bq.                 work4 = nonNullCount;
bq.                 vv5 .getMutator().set((htRowIdx), work4 .value);
bq.             }
bq.         }


 

  was:
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{quote}public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
            throws SchemaChangeException
        {
            {
                IntHolder out11 = new IntHolder();
                {
                    out11 .value = vv8 .getAccessor().get((incomingRowIdx));
                }
                IntHolder in = out11;
                work0 .value = vv1 .getAccessor().get((htRowIdx));
                BigIntHolder value = work0;
                work4 .value = vv5 .getAccessor().get((htRowIdx));
                BigIntHolder nonNullCount = work4;
                 
SumFunctions$IntSum_add: {
    nonNullCount.value = 1;
    value.value += in.value;
}
 
                work0 = value;
                vv1 .getMutator().set((htRowIdx), work0 .value);
                work4 = nonNullCount;
                vv5 .getMutator().set((htRowIdx), work4 .value);
            }
        }{quote}


 


> Hash Aggregate: Useless bigint value vector in the values batch
> ---------------------------------------------------------------
>
>                 Key: DRILL-5728
>                 URL: https://issues.apache.org/jira/browse/DRILL-5728
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Codegen
>    Affects Versions: 1.11.0
>            Reporter: Boaz Ben-Zvi
>            Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>    This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> bq. public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> bq.             throws SchemaChangeException
> bq.         {
> bq.             {
> bq.                 IntHolder out11 = new IntHolder();
> bq.                 {
> bq.                     out11 .value = vv8 
> .getAccessor().get((incomingRowIdx));
> bq.                 }
> bq.                 IntHolder in = out11;
> bq.                 work0 .value = vv1 .getAccessor().get((htRowIdx));
> bq.                 BigIntHolder value = work0;
> bq.                 work4 .value = vv5 .getAccessor().get((htRowIdx));
> bq.                 BigIntHolder nonNullCount = work4;
> bq.                  
> bq. SumFunctions$IntSum_add: {
> bq.     nonNullCount.value = 1;
> bq.     value.value += in.value;
> bq. }
> bq.  
> bq.                 work0 = value;
> bq.                 vv1 .getMutator().set((htRowIdx), work0 .value);
> bq.                 work4 = nonNullCount;
> bq.                 vv5 .getMutator().set((htRowIdx), work4 .value);
> bq.             }
> bq.         }
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch

Reply via email to