-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32868/
-----------------------------------------------------------

Review request for pig and Daniel Dai.


Bugs: PIG-4495
    https://issues.apache.org/jira/browse/PIG-4495


Repository: pig


Description
-------

Patch is work in WIP. Code almost complete. In the process of adding more tests 
and running full suite. Posting for any early comments.

This patch basically gets rid of the need for the ask TEZ-1190 Allow multiple 
edges between two vertexes. 

Changes done:
   1) Case of Self join/cross/cogroup
        - Multiple sub-plans of split write to the same output. The 
POShuffleTezLoad is now capable of splitting the input into correct bags based 
on the index in the key.
        - Do not allow cases like self-replicate join
   2) Case of union
        - Multiple sub-plans of split write to the same output and connect to 
the vertex group. If only sub-plans of the split are members of the union, then 
no vertex group is created and split is directly connected to union successors. 
        - For cases like nightly.conf Union_16.pig which has multiple levels of 
union all from same split, even the vertex group created is removed and all the 
split sub-plans write directly to the successor.
   3) Other optimizations done
        - If there was a union followed by replicate join it was not optimized 
(PIG-3856). But if the union is within the same split we now broadcast the 
replicate join once to the split operator.
   4) Refactored code in UnionOptimizer into methods for easy readability.


Diffs
-----

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezOperator.java
 1671263 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezPOPackageAnnotator.java
 1671263 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POShuffleTezLoad.java
 1671263 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/MultiQueryOptimizerTez.java
 1671263 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/TezOperDependencyParallelismEstimator.java
 1671263 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/UnionOptimizer.java
 1671263 

Diff: https://reviews.apache.org/r/32868/diff/


Testing
-------

WIP. Will update with the new tests in the next patch


Thanks,

Rohini Palaniswamy

Reply via email to