[
https://issues.apache.org/jira/browse/HIVE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phabricator updated HIVE-2566:
------------------------------
Attachment: HIVE-2566.D405.1.patch
njain requested code review of "HIVE-2566 [jira] reduce the number map-reduce
jobs for union all".
Reviewers: JIRA
HIVE-2566: initial patch
A query like:
select s.key, s.value from (
select key, value from src2 where key < 10
union all
select key, value from src3 where key < 10
union all
select key, value from src4 where key < 10
union all
select key, count(1) as value from src5 group by key
)s;
should run the last sub-query
'select key, count(1) as value from src5 group by key'
as a map-reduce job.
And then the union should be a map-only job reading from the first 3 map-only
subqueries
and the output of the last map-reduce job.
The current plan is very inefficient.
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D405
AFFECTED FILES
ql/src/test/results/clientpositive/union24.q.out
ql/src/test/queries/clientpositive/union24.q
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/807/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> reduce the number map-reduce jobs for union all
> -----------------------------------------------
>
> Key: HIVE-2566
> URL: https://issues.apache.org/jira/browse/HIVE-2566
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-2566.D405.1.patch
>
>
> A query like:
> select s.key, s.value from (
> select key, value from src2 where key < 10
> union all
> select key, value from src3 where key < 10
> union all
> select key, value from src4 where key < 10
> union all
> select key, count(1) as value from src5 group by key
> )s;
> should run the last sub-query
> 'select key, count(1) as value from src5 group by key'
> as a map-reduce job.
> And then the union should be a map-only job reading from the first 3 map-only
> subqueries
> and the output of the last map-reduce job.
> The current plan is very inefficient.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira