[
https://issues.apache.org/jira/browse/HIVE-26243?focusedWorklogId=776269&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776269
]
ASF GitHub Bot logged work on HIVE-26243:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 31/May/22 12:01
Start Date: 31/May/22 12:01
Worklog Time Spent: 10m
Work Description: asolimando commented on PR #3317:
URL: https://github.com/apache/hive/pull/3317#issuecomment-1142042596
Before going into the single discussions, the general answer to all the
above comments boils down to "I am trying to keep consistency with what was
done here for vectorizing HyperLogLog function":
https://github.com/apache/hive/pull/1824/files
I sense that you don't like how that PR was designed, but since they are
very close in spirit, and that their code is used side by side, I thought it
was important to keep them consistent.
If we need to rework the current PR, they won't match anymore, unless we
rework the HLL design and implementation too, and this has its own share of
cons...
Assuming we go for the refactoring, most of the comments are too sketchy to
give appropriate guidance over an alternative design/implementation, I will
need to ask you to elaborate more on them.
For instance, you seem to be suggesting to remove all helper classes/methods
etc. Since it does not seem feasible to inline all the code now sitting in the
helper methods/classes directly in the vectorized implementation, I guess you
want to place it someplace else, but I can't really decide based on your
comment.
For the couple of currently unused methods, I will need them in a PR
depending on this one: https://issues.apache.org/jira/browse/HIVE-26221: I can
remove them now and re-introduce them later, if preferable. Once again they
mimic HLL methods (both naming and usage, since HLL and KLL methods will be
used side by side in most places, it helps reading what's happening, see
[LongColumnStatsAggregator.java#L104-L111](https://github.com/asolimando/hive/blob/master-histograms_stats_rebased/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java#L104-L111),
for instance).
Issue Time Tracking
-------------------
Worklog Id: (was: 776269)
Time Spent: 20m (was: 10m)
> Add vectorized implementation of the 'ds_kll_sketch' UDAF
> ---------------------------------------------------------
>
> Key: HIVE-26243
> URL: https://issues.apache.org/jira/browse/HIVE-26243
> Project: Hive
> Issue Type: Improvement
> Components: UDF, Vectorization
> Affects Versions: 4.0.0-alpha-2
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> _ds_kll_sketch_ UDAF does not have a vectorized implementation at the moment,
> the present ticket aims at bridging this gap.
> This is particularly important because vectorization has an "all or nothing"
> approach, so if this function is used at the side of vectorized functions,
> they won't be able to benefit from vectorized execution.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)