[ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26572:
----------------------------------------
    Description: 
At the moment, we cannot vectorize aggregate expression having constant 
parameters in addition to the aggregation column (it's forbidden 
[here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).

One compelling example of how this could help is [PR 
1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
_compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
_compute_bit_vector_fm_ when HLL implementation has been added, while 
_compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.

Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
parameter controlling the number of threads for merging tasks. At the moment 
this parameter is "injected" when trying to find an appropriate constructor 
(see 
[VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).

This ad-hoc approach is not scalable and would make the code hard to read and 
maintain if more UDAF requires constant parameters.

In addition, we are probably missing vectorization opportunities if no such 
ad-hoc treatment is added but an appropriate UDAF constructor is available or 
could be easily added (data sketches UDAF, although not yet vectorized, are a 
good target).

  was:
At the moment, we cannot vectorize aggregate expression having constant 
parameters in addition to the aggregation column (it's forbidden 
[here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).

One compelling example of how this could help is [PR 
1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
_compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
_compute_bit_vector_fm_ when HLL implementation has been added, while 
_compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.


> Support constant expressions in vectorization
> ---------------------------------------------
>
>                 Key: HIVE-26572
>                 URL: https://issues.apache.org/jira/browse/HIVE-26572
>             Project: Hive
>          Issue Type: Improvement
>          Components: Vectorization
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.
> Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
> parameter controlling the number of threads for merging tasks. At the moment 
> this parameter is "injected" when trying to find an appropriate constructor 
> (see 
> [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).
> This ad-hoc approach is not scalable and would make the code hard to read and 
> maintain if more UDAF requires constant parameters.
> In addition, we are probably missing vectorization opportunities if no such 
> ad-hoc treatment is added but an appropriate UDAF constructor is available or 
> could be easily added (data sketches UDAF, although not yet vectorized, are a 
> good target).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to