[ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26572:
----------------------------------------
    Labels: pull-request-available  (was: )

> Support constant expressions in vectorization
> ---------------------------------------------
>
>                 Key: HIVE-26572
>                 URL: https://issues.apache.org/jira/browse/HIVE-26572
>             Project: Hive
>          Issue Type: Improvement
>          Components: Vectorization
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.
> Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
> parameter controlling the number of threads for merging tasks. At the moment 
> this parameter is "injected" when trying to find an appropriate constructor 
> (see 
> [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).
> This ad-hoc approach is not scalable and would make the code hard to read and 
> maintain if more UDAF requires constant parameters.
> In addition, we are probably missing vectorization opportunities if no such 
> ad-hoc treatment is added but an appropriate UDAF constructor is available or 
> could be easily added (data sketches UDAF, although not yet vectorized, are a 
> good target).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to