[ 
https://issues.apache.org/jira/browse/DRILL-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5446:
-------------------------------
    Fix Version/s:     (was: 1.11.0)

> Offset Vector in VariableLengthVectors may waste up to 256KB per value vector
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-5446
>                 URL: https://issues.apache.org/jira/browse/DRILL-5446
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>
> In exec/vector/src/main/codegen/templates/VariableLengthVectors.java -- the 
> implementation uses an "offset vector" to note the BEGINNING of each variable 
> length element. In order to find the length (i.e. the END of the element), 
> need to look at the FOLLOWING element. 
>   This requires the "offset vector" to have ONE MORE entry than the total 
> number of elements -- in order to find the END of the LAST element.
>   Some places in the code (e.g., the hash table) use the maximum number of 
> elements - 64K ( = 65536 ).  And each entry in the "offset vector" is 4-byte 
> UInt4, hence looks like needing 256KB. 
>   However because of that "ONE MORE", the code in this case allocates for 
> 65537, thus (rounding to next power of 2) allocating 512KB, where half is not 
> used !!!! 
>  (And this is per each varchar value vector, per each batch; e.g., in the qa 
> test Functional/aggregates/tpcds_variants/text/aggregate25.q where there are 
> 10 key columns, each hash-table batch is wasting 2.5MB !).
> Possible fix: change the logic in VariableLengthVectors.java to keep the END 
> point of each variable length element - the first element's beginning is 
> always ZERO, so it need not be kept.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to