[ https://issues.apache.org/jira/browse/HIVE-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711961#action_12711961 ]
Zheng Shao commented on HIVE-503: --------------------------------- Can you explain how to do it in one mapreduce job simultaneously? Also, what does "distinct 1.OK and 2.Failed" mean? I don't think 2 will fail. > improvement on distinct: distinguish distinct aggregate function from distinct > ------------------------------------------------------------------------------ > > Key: HIVE-503 > URL: https://issues.apache.org/jira/browse/HIVE-503 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Min Zhou > > h4.distinct > # OK > {code:sql} > select > col > from > tbl > {code} > # FAILED > {code:sql} > select > col1, > col2 > from > tbl > {code} > h4.distinct aggregate function > # OK > {code:sql} > select > count(distinct col % 10) > from > tbl > {code} > # OK > {code:sql} > select > count(distinct col1% 10) > count(distinct col1% 9) > from > tbl > {code} > # OK > {code:sql} > select > count(distinct col1 % 10) > count(distinct col2 % 9) > from > tbl > {code} > # OK > {code:sql} > select > sum(distinct col1 % 10), > count(distinct col2 % 9) > from > tbl > {code} > # OK > {code:sql} > select > max(distinct substr(col1, 1, 10)), > count(distinct col2 % 9) > from > tbl > {code} > The keyword "distinct" ofen produce more than one results, so it's impossible > removing two different columns' duplicates in only one mapreduce job, so it > failed. > But the term "distinct aggregate function" with a form like > aggregate_function(distinct ....), is in connection with the term "all > aggregate function", it essentially is an aggregate function. Only one > result each aggregate function will produce, it's very possible one > mapreduce job could deal with two or more different aggregate expression > simultaneously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.