[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

Gunther Hagleitner (JIRA) Mon, 12 Sep 2016 20:11:08 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486080#comment-15486080
 ]


Gunther Hagleitner commented on HIVE-14731:
-------------------------------------------

[~pxiong] cross product is different on the data movement layer. In a shuffle 
join we partition both sides n times and create the pairs (p1,p'1), ...,  (pn, 
p'n).

In both the unpartitioned and unpartitioned case we chop the reduce output from 
both tables into n parts and then form pairs (p1,p'1) ... (p1,p'n) .... (pn, 
p'1) ... (pn, p'n). All combinations not just where the index matches. 

The difference between the two will be that in the partitioned case we will 
partition by join key and have the ability to filter out pairs before routing 
the data.

I think the cases are different enough to deserve a new edge type.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> ----------------------------------------------------------------
>
>                 Key: HIVE-14731
>                 URL: https://issues.apache.org/jira/browse/HIVE-14731
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>         Attachments: HIVE-14731.1.patch, HIVE-14731.2.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

Reply via email to