[ 
https://issues.apache.org/jira/browse/HIVE-26243?focusedWorklogId=776269&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776269
 ]

ASF GitHub Bot logged work on HIVE-26243:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/May/22 12:01
            Start Date: 31/May/22 12:01
    Worklog Time Spent: 10m 
      Work Description: asolimando commented on PR #3317:
URL: https://github.com/apache/hive/pull/3317#issuecomment-1142042596

   Before going into the single discussions, the general answer to all the 
above comments boils down to "I am trying to keep consistency with what was 
done here for vectorizing HyperLogLog function": 
https://github.com/apache/hive/pull/1824/files
   
   I sense that you don't like how that PR was designed, but since they are 
very close in spirit, and that their code is used side by side, I thought it 
was important to keep them consistent.
   
   If we need to rework the current PR, they won't match anymore, unless we 
rework the HLL design and implementation too, and this has its own share of 
cons...
   
   Assuming we go for the refactoring, most of the comments are too sketchy to 
give appropriate guidance over an alternative design/implementation, I will 
need to ask you to elaborate more on them.
   
   For instance, you seem to be suggesting to remove all helper classes/methods 
etc. Since it does not seem feasible to inline all the code now sitting in the 
helper methods/classes directly in the vectorized implementation, I guess you 
want to place it someplace else, but I can't really decide based on your 
comment.
   
   For the couple of currently unused methods, I will need them in a PR 
depending on this one: https://issues.apache.org/jira/browse/HIVE-26221: I can 
remove them now and re-introduce them later, if preferable. Once again they 
mimic HLL methods (both naming and usage, since HLL and KLL methods will be 
used side by side in most places, it helps reading what's happening, see 
[LongColumnStatsAggregator.java#L104-L111](https://github.com/asolimando/hive/blob/master-histograms_stats_rebased/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java#L104-L111),
 for instance).




Issue Time Tracking
-------------------

    Worklog Id:     (was: 776269)
    Time Spent: 20m  (was: 10m)

> Add vectorized implementation of the 'ds_kll_sketch' UDAF
> ---------------------------------------------------------
>
>                 Key: HIVE-26243
>                 URL: https://issues.apache.org/jira/browse/HIVE-26243
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF, Vectorization
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> _ds_kll_sketch_ UDAF does not have a vectorized implementation at the moment, 
> the present ticket aims at bridging this gap.
> This is particularly important because vectorization has an "all or nothing" 
> approach, so if this function is used at the side of vectorized functions, 
> they won't be able to benefit from vectorized execution.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to