[ 
https://issues.apache.org/jira/browse/HIVE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490245#comment-13490245
 ] 

Mark Grover commented on HIVE-3647:
-----------------------------------

Are we being overly conservative by asserting the bucketing columns to be the 
exactly the same as sorting columns?

For example, for
{code}
CREATE TABLE T2(key STRING, val STRING)
CLUSTERED BY key SORTED BY (key, val) INTO 2 BUCKETS STORED AS TEXTFILE;
{code}
we should still be able to optimize (as "complete match") the Group by on 
{{key}} alone.

P.S: I don't have a phabricator account and it wouldn't let me login via my 
GitHub account. Not sure who to talk to.
                
> map-side groupby wrongly due to HIVE-3432
> -----------------------------------------
>
>                 Key: HIVE-3647
>                 URL: https://issues.apache.org/jira/browse/HIVE-3647
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3647.1.patch, hive.3647.2.patch, hive.3647.3.patch
>
>
> There seems to be a bug due to HIVE-3432.
> We are converting the group by to a map side group by after only looking at
> sorting columns. This can give wrong results if the data is sorted and
> bucketed by different columns.
> Add some tests for that scenario, verify and fix any issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to