[ https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282355#comment-14282355 ]
Jeff Zhang commented on TEZ-391: -------------------------------- Attach patch for SharedEdge * Add a new api in Edge to create shared edge {code} public Edge createSharedEdge(Vertex outputVertex) {code} * Currently it only support One-to-One and Broadcast (ScatterGather require the 2 downstream vertices has the same parallelism, otherwise shuffle will break. Although I did some change to make the ScatterGather work, but it still need more work, especially on the reducer auto-parallelism) * Add one example in tez-example to show the usage. (SharedEdgeExample) Although this patch works, after more thinking, I think using VertexGroup may be more natural and easy to understand. (We just need to make the 2 downstream vertices as a vertex group and connect the upstream vertex with this vertex group) VertexGroup is now used for shared output, it is also natural to make it support for shared input. I will attach a new patch by using VertexGroup later. > SharedEdge - Support for passing same output from a vertex as input to two > different vertices > --------------------------------------------------------------------------------------------- > > Key: TEZ-391 > URL: https://issues.apache.org/jira/browse/TEZ-391 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Rohini Palaniswamy > Assignee: Jeff Zhang > Attachments: TEZ-391-WIP-1.patch > > > We need this for lot of usecases. For cases where multi-query is turned off > and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and > we write the output multiple times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)