[ 
https://issues.apache.org/jira/browse/PIG-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918627#comment-13918627
 ] 

Rohini Palaniswamy commented on PIG-3775:
-----------------------------------------

[~last_samurai],
    This is for the Tez execution plan. The final vertex which does actual 
order by will have sorted shuffle (You can think of reducer in MR terms). There 
is a intermediate vertex in the plan which will partition the data based on the 
sample and send to the final vertex (You can think of it as the map in MR 
terms). In tez, if there are previous stages the intermediate vertex  gets 
input from those instead of reading from hdfs. Right now there is always a 
previous stage for orderby and skewed join as we use the initial vertex to read 
data from hdfs for both sampling and partitioning. Unsorted shuffle is to be 
used between the initial and the intermediate vertex. Currently we use 1-1 edge 
instead as sorted shuffle is very expensive. 

> Use unsorted shuffle in Union, Orderby, Skewed Join to improve performance
> --------------------------------------------------------------------------
>
>                 Key: PIG-3775
>                 URL: https://issues.apache.org/jira/browse/PIG-3775
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>              Labels: gsoc2014
>             Fix For: tez-branch
>
>
> When implementing Pig union, we need to gather data from two or more upstream 
> vertexes without sorting. The vertex itself might consists of several tasks. 
> Same can be done for the partitioner vertex in orderby and skewed join 
> instead of 1-1 edge for some cases of parallelism.
> TEZ-661 has been created to add custom output and input for that in Tez. It 
> is currently not in the Tez team priorities but it is important for us as it 
> will give good performance gains. We can write the custom input/output and 
> contribute it to Tez and make the corresponding changes in Pig. Marking this 
> as a candidate for GSOC 2014. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to