[
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750878#comment-13750878
]
Yin Huai commented on HIVE-5149:
--------------------------------
Suppose that we have a parent RS and a child RS. If the child RS can be
removed, ReduceSinkDeDuplication always assigns the more specific partitioning
columns to the parent RS. For example, if we have "GROUP BY a, b DISTRIBUTE BY
a", in the single MR job, the RS uses "a" and "b" as partitioning columns.
Seems we need to change ReduceSinkDeDuplication to use the more general
partitioning columns. I mean we need to use "a" as the partition column. This
change can limit the parallelism of the reduce phase.
> ReduceSinkDeDuplication can pick the wrong partitioning columns
> ---------------------------------------------------------------
>
> Key: HIVE-5149
> URL: https://issues.apache.org/jira/browse/HIVE-5149
> Project: Hive
> Issue Type: Bug
> Reporter: Yin Huai
> Assignee: Yin Huai
>
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira