[ 
https://issues.apache.org/jira/browse/HIVE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6057:
------------------------------

    Description: 
Currently, you cannot use bucketed SMJ when joining subquery results. It would 
make sense to be able to explicitly specify bucketing of the intermediate 
output from a subquery to enable bucketed SMJ.

For example, the following query will NOT use bucketed SMJ:
(gameends and dummymapping are clustered and sorted by hashid into 128 buckets)
{code}
select * from (select hashid,count(*) as c from gameends group by hashid 
distribute by hashid sort by hashid) e join dummymapping m on e.hashid=m.hashid

Suggestion: Implement an INTO n BUCKETS syntax for subqueries to enable 
bucketed SMJ:
select * from (select hashid,count(*) as c from gameends group by hashid 
distribute by hashid sort by hashid INTO 128 BUCKETS) e join dummymapping m on 
e.hashid=m.hashid
{code}

  was:
Currently, you cannot use bucketed SMJ when joining subquery results. It would 
make sense to be able to explicitly specify bucketing of the intermediate 
output from a subquery to enable bucketed SMJ.

For example, the following query will NOT use bucketed SMJ:
(gameends and dummymapping are clustered and sorted by hashid into 128 buckets)

select * from (select hashid,count(*) as c from gameends group by hashid 
distribute by hashid sort by hashid) e join dummymapping m on e.hashid=m.hashid

Suggestion: Implement an INTO n BUCKETS syntax for subqueries to enable 
bucketed SMJ:
select * from (select hashid,count(*) as c from gameends group by hashid 
distribute by hashid sort by hashid INTO 128 BUCKETS) e join dummymapping m on 
e.hashid=m.hashid


> Enable bucketed sorted merge joins of arbitrary subqueries
> ----------------------------------------------------------
>
>                 Key: HIVE-6057
>                 URL: https://issues.apache.org/jira/browse/HIVE-6057
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Jan-Erik Hedbom
>            Priority: Minor
>
> Currently, you cannot use bucketed SMJ when joining subquery results. It 
> would make sense to be able to explicitly specify bucketing of the 
> intermediate output from a subquery to enable bucketed SMJ.
> For example, the following query will NOT use bucketed SMJ:
> (gameends and dummymapping are clustered and sorted by hashid into 128 
> buckets)
> {code}
> select * from (select hashid,count(*) as c from gameends group by hashid 
> distribute by hashid sort by hashid) e join dummymapping m on 
> e.hashid=m.hashid
> Suggestion: Implement an INTO n BUCKETS syntax for subqueries to enable 
> bucketed SMJ:
> select * from (select hashid,count(*) as c from gameends group by hashid 
> distribute by hashid sort by hashid INTO 128 BUCKETS) e join dummymapping m 
> on e.hashid=m.hashid
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to