[ 
https://issues.apache.org/jira/browse/PIG-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810578#comment-13810578
 ] 

Mark Wagner commented on PIG-3555:
----------------------------------

I think that's a good way to do it. One comment: Tez also does combiners as 
part of OnFileSortedOutput (like the traditional mapred combiners). I'd propose 
we create a new "TezEdge" to serve as a descriptor for edges, since this is 
likely an area where we'll be doing a lot of optimization in the future w/ Tez 
(Streaming edges, Shuffles with no sorting, etc.) and it would be good to have 
some separation from TezOp. Then every TezOperator can maintain knowledge of 
it's input and output TezEdges.

> Initial implementation of combiner optimization
> -----------------------------------------------
>
>                 Key: PIG-3555
>                 URL: https://issues.apache.org/jira/browse/PIG-3555
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: tez-branch
>
>
> To support algebraic UDFs and others, combiner is required. To start with, I 
> am proposing the following initial implementation-
> * In Tez, combiner runs as part of ShuffledMergedInput in edges, so multiple 
> combine plans (one per edge) need to be registered in a destination vertex. 
> Each vertex is mapped to a TezOperator in Tez plan, so an array of combine 
> plans will be stored in the TezOperator that maps to a destination vertex.
> * To register combine plans in a TezOperator, we will run a CombinerOptimizer 
> on the Tez plan after TezCompiler generates it but before TezDagBuilder 
> converts it into DAG.
> * Finally, TezDagBuilder will insert combine plans into the payload of 
> ShuffledMergedInput while constructing a destination vertex.
> This initial implementation will allow us to run algebraic UDFs. In the 
> future, we can implement more optimizations for limit, order-by, etc on top 
> of this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to