[ 
https://issues.apache.org/jira/browse/HIVE-24221?focusedWorklogId=500618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500618
 ]

ASF GitHub Bot logged work on HIVE-24221:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Oct/20 12:33
            Start Date: 14/Oct/20 12:33
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on a change in pull request #1544:
URL: https://github.com/apache/hive/pull/1544#discussion_r504638460



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
##########
@@ -233,6 +235,23 @@ public static ExprNodeGenericFuncDesc 
and(List<ExprNodeDesc> exps) {
     return new ExprNodeGenericFuncDesc(TypeInfoFactory.booleanTypeInfo, new 
GenericUDFOPAnd(), "and", flatExps);
   }
 
+  /**
+   * Create an expression for computing a hash by recursively hashing given 
expressions by two:
+   * <pre>
+   * Input: HASH(A, B, C, D)
+   * Output: HASH(HASH(HASH(A,B),C),D)
+   * </pre>
+   */
+  public static ExprNodeGenericFuncDesc hash(List<ExprNodeDesc> exps) {
+    assert exps.size() >= 2;
+    ExprNodeDesc hashExp = exps.get(0);
+    for (int i = 1; i < exps.size(); i++) {
+      List<ExprNodeDesc> hArgs = Arrays.asList(hashExp, exps.get(i));
+      hashExp = new ExprNodeGenericFuncDesc(TypeInfoFactory.intTypeInfo, new 
GenericUDFMurmurHash(), "hash", hArgs);

Review comment:
       Good catch @kgyrtkirk ! I've never noticed that we have two different 
UDFs for hashing. Indeed having the same annotation can create quite some 
confusion and difficult to debug problems. I guess your suggestion is to change 
the annotation of GenericUDFMurmurHash to murmur_hash right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 500618)
    Time Spent: 0.5h  (was: 20m)

> Use vectorizable expression to combine multiple columns in semijoin bloom 
> filters
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-24221
>                 URL: https://issues.apache.org/jira/browse/HIVE-24221
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>         Environment: 
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, multi-column semijoin reducers use an n-ary call to 
> GenericUDFMurmurHash to combine multiple values into one, which is used as an 
> entry to the bloom filter. However, there are no vectorized operators that 
> treat n-ary inputs. The same goes for the vectorized implementation of 
> GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple 
> values into one to pass in the bloom filter comprising only vectorized 
> operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to