-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15949/
-----------------------------------------------------------

(Updated Dec. 3, 2013, 4:31 p.m.)


Review request for pig, Alex Bain, Cheolsoo Park, Daniel Dai, and Mark Wagner.


Changes
-------

Fixed Order by and addressed review comments. 

Additional TODO:

  - Split followed by join (multiple level of splits) and orderby needs to be 
fixed. With change to calculating splits on the client instead of AM, order by 
following split,groupby, etc is broken because the temporary files are not 
present when DAG is built.
  - Found that order by desc is not working and is always ordered in ascending 
order. 


Bugs: PIG-3564 and PIG-3565
    https://issues.apache.org/jira/browse/PIG-3564
    https://issues.apache.org/jira/browse/PIG-3565


Repository: pig


Description (updated)
-------

 - POStore and POLocalRearrange are replaced by POStoreTez and 
POLocalRearrangeTez which have the name of the LogicalOutput. Output is 
directly written through them and output related code removed from 
PigProcessor. In the case of combiner, PigCombiner writes through the reduce 
Context which is routed to LogicalOutput (MRCombiner in Tez handles this).
 - This patch also contains the security related fixes for PIG-3564. Did not 
separate it out as I was doing most of the e2e testing with that. Will use 
PIG-3564 to checkin any incremental changes required after TEZ-606 is fixed. 

Still need to handle few cases:
  - custom partitioner
  - secondary sort key
  - split followed by orderby and join
  - memory management (In pig or Tez?) - Was hitting OOM with multiple logical 
outputs as sort on the split vertex was taking up thrice the amount of memory 
for 3 logical outputs (OOM in Tez DefaultSorter.java kvbuffer = new 
byte[maxMemUsage]; )


Diffs (updated)
-----

  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/PigServer.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POStoreTez.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecurityHelper.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDAG.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOutput.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOPackageAnnotator.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/tools/pigstats/tez/TezStats.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC6.gld
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC8.gld
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java
 1546896 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezJobControlCompiler.java
 1546896 

Diff: https://reviews.apache.org/r/15949/diff/


Testing
-------

- Manually tested SPLIT and store within a single vertex, SPLIT output to 
multiple vertexes and case where there is POSplit when grouping on same data on 
different keys. 
- Yet to test different combiners on different edges, but should mostly work.
- Have some problem with getting e2e to run. Will update tez.conf with e2e 
tests in a separate jira later. 


Thanks,

Rohini Palaniswamy

Reply via email to