[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

ASF GitHub Bot (Jira) Wed, 23 Feb 2022 07:29:05 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731641
 ]


ASF GitHub Bot logged work on HIVE-25448:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Feb/22 15:28
            Start Date: 23/Feb/22 15:28
    Worklog Time Spent: 10m 
      Work Description: dengzhhu653 edited a comment on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780


   I found something interesting, when I explain `select col1, count(distinct 
col2) from partition_distinct_skew group by col1;` on master branch,  the 
output is following:
   ```
         Vertices:
           Map 1
               Map Operator Tree:
                   TableScan
                     alias: partition_distinct_skew
                     Statistics: Num rows: 3 Data size: 510 Basic stats: 
COMPLETE Column stats: COMPLETE
                     Select Operator
                       expressions: col1 (type: string), col2 (type: string)
                       outputColumnNames: col1, col2
                       Statistics: Num rows: 3 Data size: 510 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Group By Operator
                         keys: col1 (type: string), col2 (type: string)
                         minReductionHashAggr: 0.4
                         mode: hash
                         outputColumnNames: _col0, _col1
                         Statistics: Num rows: 2 Data size: 340 Basic stats: 
COMPLETE Column stats: COMPLETE
                         Reduce Output Operator
                           key expressions: _col0 (type: string), _col1 (type: 
string)
                           null sort order: zz
                           sort order: ++
                           Map-reduce partition columns: rand() (type: double)
                           Statistics: Num rows: 2 Data size: 340 Basic stats: 
COMPLETE Column stats: COMPLETE
   ```
   The partition column is **rand()** for this case. It's seems we have done 
something to improve the skew case, though I'm not able to find where the cause 
locates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 731641)
    Time Spent: 2h 40m  (was: 2.5h)

> Invalid partition columns when skew with distinct
> -------------------------------------------------
>
>                 Key: HIVE-25448
>                 URL: https://issues.apache.org/jira/browse/HIVE-25448
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>            Reporter: Zhihua Deng
>            Assignee: Zhihua Deng
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

Reply via email to