[ 
https://issues.apache.org/jira/browse/TEZ-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3500:
-------------------------
    Summary: Fair routing support for multiple source vertices  (was: Support 
for multiple source vertices)

> Fair routing support for multiple source vertices
> -------------------------------------------------
>
>                 Key: TEZ-3500
>                 URL: https://issues.apache.org/jira/browse/TEZ-3500
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>
> For fair_parallelism policy where multiple destination tasks process the data 
> from different source tasks of the same partition, current implementation 
> only supports one source vertex.
> Support for multiple source vertices will enable skewed shuffle join as 
> mentioned in 
> https://issues.apache.org/jira/browse/TEZ-3209?focusedCommentId=15385449&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15385449.
>  Some rough ideas:
> * For a large partition, if the volume comes mostly from one source vertex, 
> apply fair routing on that primary source vertex and have other vertices 
> broadcast their output to those destination tasks processing that partition.
> * If the large partition volume is big from more than one source vertex, then 
> we will need something like cartesian product to do the join of different 
> sub-partition data from multiple vertices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to