[
https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013938#comment-13013938
]
Amareshwari Sriramadasu commented on HIVE-2056:
-----------------------------------------------
For a query of the form,
"From table T
insert overwrite table test1 select col1, count(distinct colx) group by col1
insert overwrite table test2 select col2, count(distinct colx) group by col2;"
it is not possible to generate a single M/R job, because partitioning the input
row by both col1 and col2 in a single stage does not work here.
If the groupby keys are such that one keyset is a subset of the other, i.e. of
the following form:
"From table T
insert overwrite table test1 select col1, count(distinct colx) group by col1
insert overwrite table test2 select col1, col2, count(distinct colx) group by
col1, col2;",
we can run it in a single MR job by spraying over common groupby keyset( i.e.
col1). Will implement this and see if it reduces query execution time.
Thoughts?
> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
> Key: HIVE-2056
> URL: https://issues.apache.org/jira/browse/HIVE-2056
> Project: Hive
> Issue Type: Improvement
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira