[
https://issues.apache.org/jira/browse/DRILL-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers updated DRILL-5446:
-------------------------------
Fix Version/s: (was: 1.11.0)
> Offset Vector in VariableLengthVectors may waste up to 256KB per value vector
> -----------------------------------------------------------------------------
>
> Key: DRILL-5446
> URL: https://issues.apache.org/jira/browse/DRILL-5446
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.10.0
> Reporter: Boaz Ben-Zvi
> Assignee: Boaz Ben-Zvi
>
> In exec/vector/src/main/codegen/templates/VariableLengthVectors.java -- the
> implementation uses an "offset vector" to note the BEGINNING of each variable
> length element. In order to find the length (i.e. the END of the element),
> need to look at the FOLLOWING element.
> This requires the "offset vector" to have ONE MORE entry than the total
> number of elements -- in order to find the END of the LAST element.
> Some places in the code (e.g., the hash table) use the maximum number of
> elements - 64K ( = 65536 ). And each entry in the "offset vector" is 4-byte
> UInt4, hence looks like needing 256KB.
> However because of that "ONE MORE", the code in this case allocates for
> 65537, thus (rounding to next power of 2) allocating 512KB, where half is not
> used !!!!
> (And this is per each varchar value vector, per each batch; e.g., in the qa
> test Functional/aggregates/tpcds_variants/text/aggregate25.q where there are
> 10 key columns, each hash-table batch is wasting 2.5MB !).
> Possible fix: change the logic in VariableLengthVectors.java to keep the END
> point of each variable length element - the first element's beginning is
> always ZERO, so it need not be kept.
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)