[ 
https://issues.apache.org/jira/browse/HIVE-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711988#action_12711988
 ] 

Min Zhou commented on HIVE-503:
-------------------------------

oops, my mistake. 2 shoule be select  distinct col1, distinct col2, which will 
fail.

okay,  let 4 for example: 

mapper output: 
<other_key + '1' + "col1%10",  value>
<other_key + '2 + "col2%9",  value>

after doing distinct,  then redcuer:
if(1) {
  count1 ++
} else {
  count2 ++
}
reducer output: 
<count1 + delimiter + count2, ...>

> improvement on distinct: distinguish distinct aggregate function from distinct
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-503
>                 URL: https://issues.apache.org/jira/browse/HIVE-503
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Min Zhou
>
> h4.distinct
> # OK
> {code:sql}
> select 
>    distinct col
> from 
>   tbl
> {code}
> # FAILED
> {code:sql}
> select 
>    distinct  col1,
>    distinct  col2
> from 
>   tbl
> {code}
> h4.distinct aggregate function
> # OK
> {code:sql}
> select 
>    count(distinct col % 10)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>    count(distinct col1% 10)
>    count(distinct col1% 9)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>    count(distinct col1 % 10)
>    count(distinct col2 % 9)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>   sum(distinct col1 % 10),
>   count(distinct col2 % 9)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>   max(distinct substr(col1, 1, 10)),
>   count(distinct col2 % 9)
> from 
>   tbl
> {code}
> The keyword "distinct" ofen produce more than one results, so it's impossible 
> removing two different columns' duplicates in only one mapreduce job, so it 
> failed.
> But the term "distinct aggregate function" with a form like 
> aggregate_function(distinct ....),  is in connection with the term "all 
> aggregate function",  it essentially is an aggregate function. Only one 
> result each aggregate function will produce,  it's very possible one 
> mapreduce job could deal with two or more different aggregate expression 
> simultaneously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to