[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Amareshwari Sriramadasu (JIRA) Thu, 31 Mar 2011 05:05:49 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013938#comment-13013938
 ]


Amareshwari Sriramadasu commented on HIVE-2056:
-----------------------------------------------

For a query of the form,
"From table T
 insert overwrite table test1 select col1, count(distinct colx) group by col1
 insert overwrite table test2 select col2, count(distinct colx) group by col2;" 
it is not possible to generate a single M/R job, because partitioning the input 
row by both col1 and col2 in a single stage does not work here. 
If the groupby keys are such that one keyset is a subset of the other, i.e. of 
the following form: 
"From table T 
insert overwrite table test1 select col1, count(distinct colx) group by col1 
insert overwrite table test2 select col1, col2, count(distinct colx) group by 
col1, col2;", 
we can run it in a single MR job by spraying over common groupby keyset( i.e. 
col1). Will implement this and see if it reduces query execution time.

Thoughts? 



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Reply via email to