[
https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447496#comment-13447496
]
Namit Jain commented on HIVE-3426:
----------------------------------
Consider a query like:
insert overwrite table tst_output partition (ds='1')
select key, keyVal, agg
from
(
select key, '238' as keyVal, count(1) as agg from srcpart
where ds = '2008-04-08' and hr = '11' and key = 238
group by key
union all
select key, '165' as keyVal, count(1) as agg from srcpart
where ds = '2008-04-08' and hr = '11' and key = 165
group by key
union all
select key, '409' as keyVal, count(1) as agg from srcpart
where ds = '2008-04-08' and hr = '12' and key = 409
group by key
union all
select key, '484' as keyVal, count(1) as agg from srcpart
where ds = '2008-04-08' and hr = '12' and key = 484
group by key
) subq;
It requires different map-reduce jobs for each sub-query.
Since the same base table is being queried, ideally it should be a single
map-reduce job.
Atleast a single scan should be performed on ds=2008-04-08/hr=11 and hr=12
respectively.
The query plan should be like a query plan with different branches for the
table scan.
> union with same source should be optimized
> ------------------------------------------
>
> Key: HIVE-3426
> URL: https://issues.apache.org/jira/browse/HIVE-3426
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Namit Jain
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira