[jira] [Commented] (HIVE-3426) union with same source should be optimized

Namit Jain (JIRA) Mon, 03 Sep 2012 22:47:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447496#comment-13447496
 ]


Namit Jain commented on HIVE-3426:
----------------------------------

Consider a query like:


insert overwrite table tst_output partition (ds='1')
select key, keyVal, agg
from
  (
    select key, '238' as keyVal, count(1) as agg from srcpart
    where ds = '2008-04-08' and hr = '11' and key = 238
    group by key

union all

    select key, '165' as keyVal, count(1) as agg from srcpart
    where ds = '2008-04-08' and hr = '11' and key = 165
    group by key

union all

    select key, '409' as keyVal, count(1) as agg from srcpart
    where ds = '2008-04-08' and hr = '12' and key = 409
    group by key

union all

    select key, '484' as keyVal, count(1) as agg from srcpart
    where ds = '2008-04-08' and hr = '12' and key = 484
    group by key
) subq;



It requires different map-reduce jobs for each sub-query.
Since the same base table is being queried, ideally it should be a single 
map-reduce job.
Atleast a single scan should be performed on ds=2008-04-08/hr=11 and hr=12 
respectively.
The query plan should be like a query plan with different branches for the 
table scan. 
                
> union with same source should be optimized
> ------------------------------------------
>
>                 Key: HIVE-3426
>                 URL: https://issues.apache.org/jira/browse/HIVE-3426
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3426) union with same source should be optimized

Reply via email to