[ https://issues.apache.org/jira/browse/HIVE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490245#comment-13490245 ]
Mark Grover commented on HIVE-3647: ----------------------------------- Are we being overly conservative by asserting the bucketing columns to be the exactly the same as sorting columns? For example, for {code} CREATE TABLE T2(key STRING, val STRING) CLUSTERED BY key SORTED BY (key, val) INTO 2 BUCKETS STORED AS TEXTFILE; {code} we should still be able to optimize (as "complete match") the Group by on {{key}} alone. P.S: I don't have a phabricator account and it wouldn't let me login via my GitHub account. Not sure who to talk to. > map-side groupby wrongly due to HIVE-3432 > ----------------------------------------- > > Key: HIVE-3647 > URL: https://issues.apache.org/jira/browse/HIVE-3647 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Namit Jain > Assignee: Namit Jain > Attachments: hive.3647.1.patch, hive.3647.2.patch, hive.3647.3.patch > > > There seems to be a bug due to HIVE-3432. > We are converting the group by to a map side group by after only looking at > sorting columns. This can give wrong results if the data is sorted and > bucketed by different columns. > Add some tests for that scenario, verify and fix any issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira