[
https://issues.apache.org/jira/browse/HIVE-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711988#action_12711988
]
Min Zhou commented on HIVE-503:
-------------------------------
oops, my mistake. 2 shoule be select distinct col1, distinct col2, which will
fail.
okay, let 4 for example:
mapper output:
<other_key + '1' + "col1%10", value>
<other_key + '2 + "col2%9", value>
after doing distinct, then redcuer:
if(1) {
count1 ++
} else {
count2 ++
}
reducer output:
<count1 + delimiter + count2, ...>
> improvement on distinct: distinguish distinct aggregate function from distinct
> ------------------------------------------------------------------------------
>
> Key: HIVE-503
> URL: https://issues.apache.org/jira/browse/HIVE-503
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Min Zhou
>
> h4.distinct
> # OK
> {code:sql}
> select
> distinct col
> from
> tbl
> {code}
> # FAILED
> {code:sql}
> select
> distinct col1,
> distinct col2
> from
> tbl
> {code}
> h4.distinct aggregate function
> # OK
> {code:sql}
> select
> count(distinct col % 10)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> count(distinct col1% 10)
> count(distinct col1% 9)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> count(distinct col1 % 10)
> count(distinct col2 % 9)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> sum(distinct col1 % 10),
> count(distinct col2 % 9)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> max(distinct substr(col1, 1, 10)),
> count(distinct col2 % 9)
> from
> tbl
> {code}
> The keyword "distinct" ofen produce more than one results, so it's impossible
> removing two different columns' duplicates in only one mapreduce job, so it
> failed.
> But the term "distinct aggregate function" with a form like
> aggregate_function(distinct ....), is in connection with the term "all
> aggregate function", it essentially is an aggregate function. Only one
> result each aggregate function will produce, it's very possible one
> mapreduce job could deal with two or more different aggregate expression
> simultaneously.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.