[
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando updated HIVE-26572:
----------------------------------------
Labels: pull-request-available (was: )
> Support constant expressions in vectorization
> ---------------------------------------------
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
> Issue Type: Improvement
> Components: Vectorization
> Affects Versions: 4.0.0-alpha-2
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
>
> At the moment, we cannot vectorize aggregate expression having constant
> parameters in addition to the aggregation column (it's forbidden
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ +
> _compute_bit_vector_fm_ when HLL implementation has been added, while
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.
> Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant
> parameter controlling the number of threads for merging tasks. At the moment
> this parameter is "injected" when trying to find an appropriate constructor
> (see
> [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).
> This ad-hoc approach is not scalable and would make the code hard to read and
> maintain if more UDAF requires constant parameters.
> In addition, we are probably missing vectorization opportunities if no such
> ad-hoc treatment is added but an appropriate UDAF constructor is available or
> could be easily added (data sketches UDAF, although not yet vectorized, are a
> good target).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)