[jira] [Work logged] (HIVE-25498) Query with more than 32 count distinct functions returns wrong result

ASF GitHub Bot (Jira) Wed, 08 Sep 2021 08:49:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25498?focusedWorklogId=648041&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-648041
 ]


ASF GitHub Bot logged work on HIVE-25498:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Sep/21 15:48
            Start Date: 08/Sep/21 15:48
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on pull request #2616:
URL: https://github.com/apache/hive/pull/2616#issuecomment-915356626


   > @ujc714
   > With this patch the maximum number of `count(distinct)` expressions can be 
handled by the `HiveExpandDistinctAggregatesRule` is increased to 63 from 31. 
But this limitation still exists. Could you please add a check here:
   > 
https://github.com/apache/hive/blob/72d860ad7721e705c830ca5f141a79e899cc86f7/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveExpandDistinctAggregatesRule.java#L115
   > 
   > like
   > 
   > ```
   >   if (numCountDistinct == 0 || numCountDistinct > 63 || 
aggregate.getGroupType() != Group.SIMPLE) {
   >       return;
   >   }
   > ```
   
   Agree with @kasakrisz -- an alternative would be to change 
**getGroupingIdValue** logic but that could be tricky.
   At the end of the day we should not be limited by the number of count 
distinct functions so having this extra check as part of the onMatch method 
makes sense to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 648041)
    Time Spent: 0.5h  (was: 20m)

> Query with more than 32 count distinct functions returns wrong result
> ---------------------------------------------------------------------
>
>                 Key: HIVE-25498
>                 URL: https://issues.apache.org/jira/browse/HIVE-25498
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Robbie Zhang
>            Assignee: Robbie Zhang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, all 
> these COUNT functions in this query return 0 instead of the proper values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, 
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 
> string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 
> 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 
> 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 
> 'c29', 'c30', 'c31', 'c32'); 
> select count (distinct c0), count(distinct c1), count(distinct c2), 
> count(distinct c3), count(distinct c4), count(distinct c5), count(distinct 
> c6), count(distinct c7), count(distinct c8), count(distinct c9), 
> count(distinct c10), count(distinct c11), count(distinct c12), count(distinct 
> c13), count(distinct c14), count(distinct c15), count(distinct c16), 
> count(distinct c17), count(distinct c18), count(distinct c19), count(distinct 
> c20), count(distinct c21), count(distinct c22), count(distinct c23), 
> count(distinct c24), count(distinct c25), count(distinct c26), count(distinct 
> c27), count(distinct c28), count(distinct c29), count(distinct c30), 
> count(distinct c31), count(distinct c32) from test_count;
> {code}
>  This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
> which uses int type. When there are more than 32 groupings the values 
> overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25498) Query with more than 32 count distinct functions returns wrong result

Reply via email to