[jira] Commented: (HIVE-503) improvement on distinct: distinguish distinct aggregate function from distinct

Zheng Shao (JIRA) Thu, 21 May 2009 23:09:11 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711961#action_12711961
 ]


Zheng Shao commented on HIVE-503:
---------------------------------

Can you explain how to do it in one mapreduce job simultaneously?

Also, what does "distinct 1.OK and 2.Failed" mean? I don't think 2 will fail.

> improvement on distinct: distinguish distinct aggregate function from distinct
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-503
>                 URL: https://issues.apache.org/jira/browse/HIVE-503
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Min Zhou
>
> h4.distinct
> # OK
> {code:sql}
> select 
>    col
> from 
>   tbl
> {code}
> # FAILED
> {code:sql}
> select 
>    col1,
>    col2
> from 
>   tbl
> {code}
> h4.distinct aggregate function
> # OK
> {code:sql}
> select 
>    count(distinct col % 10)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>    count(distinct col1% 10)
>    count(distinct col1% 9)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>    count(distinct col1 % 10)
>    count(distinct col2 % 9)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>   sum(distinct col1 % 10),
>   count(distinct col2 % 9)
> from 
>   tbl
> {code}
> # OK
> {code:sql}
> select 
>   max(distinct substr(col1, 1, 10)),
>   count(distinct col2 % 9)
> from 
>   tbl
> {code}
> The keyword "distinct" ofen produce more than one results, so it's impossible 
> removing two different columns' duplicates in only one mapreduce job, so it 
> failed.
> But the term "distinct aggregate function" with a form like 
> aggregate_function(distinct ....),  is in connection with the term "all 
> aggregate function",  it essentially is an aggregate function. Only one 
> result each aggregate function will produce,  it's very possible one 
> mapreduce job could deal with two or more different aggregate expression 
> simultaneously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-503) improvement on distinct: distinguish distinct aggregate function from distinct

Reply via email to