[
https://issues.apache.org/jira/browse/HIVE-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Min Zhou updated HIVE-503:
--------------------------
Description:
distinct
# OK
{code:sql}
select
col
from
tbl
{code}
# FAILED
{code:sql}
select
col1,
col2
from
tbl
{code}
distinguish distinct aggregate function
# OK
{code:sql}
select
count(distinct col % 10)
from
tbl
{code}
# OK
{code:sql}
select
count(distinct col1% 10)
count(distinct col1% 9)
from
tbl
{code}
# OK
{code:sql}
select
count(distinct col1 % 10)
count(distinct col2 % 9)
from
tbl
{code}
# OK
{code:sql}
select
sum(distinct col1 % 10),
count(distinct col2 % 9)
from
tbl
{code}
# OK
{code:sql}
select
max(distinct substr(col1, 1, 10)),
count(distinct col2 % 9)
from
tbl
{code}
The keyword "distinct" ofen produce more than one results, so it's impossible
removing two different columns' duplicates in only one mapreduce job, so it
failed.
But the term "distinct aggregate function" with a form like
aggregate_function(distinct ....), is in connection with the term "all
aggregate function", it essentially is an aggregate function. Only one result
each aggregate function will produce, it's very able one mapreduce job do two
different aggregate expression simultaneously.
was:
distinct
# OK
{code:sql}
select
col
from
tbl
{code}
# FAILED
{code:sql}
select
col1,
col2
from
tbl
{code}
distinguish distinct aggregate function
# OK
{code:sql}
select
count(distinct col % 10)
from
tbl
{code}
# OK
{code:sql}
select
count(distinct col1% 10)
count(distinct col1% 9)
from
tbl
{code}
# OK
{code:sql}
select
count(distinct col1 % 10)
count(distinct col2 % 9)
from
tbl
{code}
# OK
{code:sql}
select
sum(distinct col1 % 10),
count(distinct col2 % 9)
from
tbl
{code}
# OK
{code:sql}
select
max(distinct substr(col1, 1, 10)),
count(distinct col2 % 9)
from
tbl
{code}
The keyword "distinct" ofen produce more than one results, so it's impossible
removing two different cols' duplicates in only one mapreduce job, so it failed.
But the term "distinct aggregate function" with a form like
aggregate_function(distinct ....), is in connection with the term "all
aggregate function", it essentially is an aggregate function. Only one result
each aggregate function will produce, it's very able one mapreduce job do two
different aggregate expression simultaneously.
> improvement on distinct: distinguish distinct aggregate function from distinct
> ------------------------------------------------------------------------------
>
> Key: HIVE-503
> URL: https://issues.apache.org/jira/browse/HIVE-503
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Min Zhou
>
> distinct
> # OK
> {code:sql}
> select
> col
> from
> tbl
> {code}
> # FAILED
> {code:sql}
> select
> col1,
> col2
> from
> tbl
> {code}
> distinguish distinct aggregate function
> # OK
> {code:sql}
> select
> count(distinct col % 10)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> count(distinct col1% 10)
> count(distinct col1% 9)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> count(distinct col1 % 10)
> count(distinct col2 % 9)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> sum(distinct col1 % 10),
> count(distinct col2 % 9)
> from
> tbl
> {code}
> # OK
> {code:sql}
> select
> max(distinct substr(col1, 1, 10)),
> count(distinct col2 % 9)
> from
> tbl
> {code}
> The keyword "distinct" ofen produce more than one results, so it's impossible
> removing two different columns' duplicates in only one mapreduce job, so it
> failed.
> But the term "distinct aggregate function" with a form like
> aggregate_function(distinct ....), is in connection with the term "all
> aggregate function", it essentially is an aggregate function. Only one
> result each aggregate function will produce, it's very able one mapreduce
> job do two different aggregate expression simultaneously.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.