[ https://issues.apache.org/jira/browse/HIVE-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711988#action_12711988 ]
Min Zhou commented on HIVE-503: ------------------------------- oops, my mistake. 2 shoule be select distinct col1, distinct col2, which will fail. okay, let 4 for example: mapper output: <other_key + '1' + "col1%10", value> <other_key + '2 + "col2%9", value> after doing distinct, then redcuer: if(1) { count1 ++ } else { count2 ++ } reducer output: <count1 + delimiter + count2, ...> > improvement on distinct: distinguish distinct aggregate function from distinct > ------------------------------------------------------------------------------ > > Key: HIVE-503 > URL: https://issues.apache.org/jira/browse/HIVE-503 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Min Zhou > > h4.distinct > # OK > {code:sql} > select > distinct col > from > tbl > {code} > # FAILED > {code:sql} > select > distinct col1, > distinct col2 > from > tbl > {code} > h4.distinct aggregate function > # OK > {code:sql} > select > count(distinct col % 10) > from > tbl > {code} > # OK > {code:sql} > select > count(distinct col1% 10) > count(distinct col1% 9) > from > tbl > {code} > # OK > {code:sql} > select > count(distinct col1 % 10) > count(distinct col2 % 9) > from > tbl > {code} > # OK > {code:sql} > select > sum(distinct col1 % 10), > count(distinct col2 % 9) > from > tbl > {code} > # OK > {code:sql} > select > max(distinct substr(col1, 1, 10)), > count(distinct col2 % 9) > from > tbl > {code} > The keyword "distinct" ofen produce more than one results, so it's impossible > removing two different columns' duplicates in only one mapreduce job, so it > failed. > But the term "distinct aggregate function" with a form like > aggregate_function(distinct ....), is in connection with the term "all > aggregate function", it essentially is an aggregate function. Only one > result each aggregate function will produce, it's very possible one > mapreduce job could deal with two or more different aggregate expression > simultaneously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.