[ https://issues.apache.org/jira/browse/TEZ-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma updated TEZ-3500: ------------------------- Summary: Fair routing support for multiple source vertices (was: Support for multiple source vertices) > Fair routing support for multiple source vertices > ------------------------------------------------- > > Key: TEZ-3500 > URL: https://issues.apache.org/jira/browse/TEZ-3500 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Ming Ma > > For fair_parallelism policy where multiple destination tasks process the data > from different source tasks of the same partition, current implementation > only supports one source vertex. > Support for multiple source vertices will enable skewed shuffle join as > mentioned in > https://issues.apache.org/jira/browse/TEZ-3209?focusedCommentId=15385449&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15385449. > Some rough ideas: > * For a large partition, if the volume comes mostly from one source vertex, > apply fair routing on that primary source vertex and have other vertices > broadcast their output to those destination tasks processing that partition. > * If the large partition volume is big from more than one source vertex, then > we will need something like cartesian product to do the join of different > sub-partition data from multiple vertices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)