[jira] [Comment Edited] (TEZ-2104) A CrossProductEdge which produces synthetic cross-product parallelism

Gopal V (JIRA) Fri, 13 Feb 2015 18:49:00 -0800

    [ 
https://issues.apache.org/jira/browse/TEZ-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321182#comment-14321182
 ]


Gopal V edited comment on TEZ-2104 at 2/14/15 2:47 AM:
-------------------------------------------------------

The cross-product edge has special affinity scheduling optimizations for one of 
the edges (the one which moves the most amount of data) to avoid re-merging the 
input streams during node/container co-located runs.

The built-in edge opens up a need for evolving affinity optimizations related 
to temporal locality in the scheduler.


was (Author: gopalv):
The cross-product edge has special affinity scheduling optimizations for one of 
the edges (the one which moves the most amount of data) to avoid re-merging the 
input streams during node/container co-located runs.

The built-in edge opens up a need to evolving affinity optimizations related to 
temporal locality in the scheduler.

> A CrossProductEdge which produces synthetic cross-product parallelism
> ---------------------------------------------------------------------
>
>                 Key: TEZ-2104
>                 URL: https://issues.apache.org/jira/browse/TEZ-2104
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Gopal V
>              Labels: gsoc, gsoc2015, hadoop, hive, java, tez
>
> Instead of producing duplicate data for the synthetic cross-product, to fit 
> into partitions, the amount of net IO can be vastly reduced by a special 
> purpose cross-product data movement edge.
> The Shuffle edge routes each partition's output to a single reducer, while 
> the cross-product edge routes it into a matrix of reducers without actually 
> duplicating the disk data.
> A partitioning scheme with 3 partitions on the lhs and rhs of a join 
> operation can be routed into 9 reducers by performing a cross-product similar 
> to 
> (1,2,3) x (a,b,c) = [(1,a), (1,b), (1,c), (2,a), (2,b) ...]
> This turns a single task cross-product model into a distributed cross product.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2104) A CrossProductEdge which produces synthetic cross-product parallelism

Reply via email to