[ 
https://issues.apache.org/jira/browse/HIVE-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179951#comment-15179951
 ] 

Jesus Camacho Rodriguez commented on HIVE-13096:
------------------------------------------------

[~ashutoshc], I finally could take a look back at this one.

The heuristic change impacts the selection of table chosen for streaming, and 
it might change the shape of the DAG too e.g. in the presence of GB + Join.

For instance, consider {{bucket_map_join_tez1.q}}.

- Previously, the shape was:
{noformat}
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (CUSTOM_SIMPLE_EDGE)
{noformat}
Reducer2 contains a GB on the input from Map1 (TS on table1), followed by a 
Join.
In this case, Map3 (TS on table2) is broadcasted for the Join execution that is 
done in Reduce2.

- With the patch, the shape is:
{noformat}
        Map 3 <- Reducer 2 (CUSTOM_EDGE)
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
{noformat}
Reducer2 contains a GB on the input from Map1 (TS on table1).
In this case, the output of GB is broadcasted for the Join execution that is 
done in Map3.


> Cost to choose side table in MapJoin conversion based on cumulative 
> cardinality
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-13096
>                 URL: https://issues.apache.org/jira/browse/HIVE-13096
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-13096.01.patch, HIVE-13096.02.patch, 
> HIVE-13096.03.patch, HIVE-13096.patch
>
>
> HIVE-11954 changed the logic to choose the side table in the MapJoin 
> conversion algorithm. Initial heuristic for the cost was based on number of 
> heavyweight operators.
> This extends that work so the heuristic is based on accumulate cardinality. 
> In the future, we should choose the side based on total latency for the input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to