[ 
https://issues.apache.org/jira/browse/HIVE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473788#comment-13473788
 ] 

Namit Jain commented on HIVE-3565:
----------------------------------

Consider a query like:

select B.y, count(1) from
A join B on A.x=B.x
group by B.y;

This will require  2 MR jobs. The first MR job will perform the join, and the 
second MR job will perform the group by (note that the 2nd MR job would have a 
identity mapper). If the first MR job could write the output of the join to a 
HBase table (which is keyed by B.y), the 2nd MR can be a map-only job which can
simply scan the HBase table. This idea can be extended to joins as well.
                
> use hbase tables for writing intermediate directories across map-reduce 
> boundaries
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-3565
>                 URL: https://issues.apache.org/jira/browse/HIVE-3565
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to