-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16717/#review31403
-----------------------------------------------------------

Ship it!


I will commit it after running tests.

- Cheolsoo Park


On Jan. 8, 2014, 1:47 a.m., Alex Bain wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16717/
> -----------------------------------------------------------
> 
> (Updated Jan. 8, 2014, 1:47 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini 
> Palaniswamy.
> 
> 
> Bugs: PIG-3562
>     https://issues.apache.org/jira/browse/PIG-3562
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Implement DISTINCT combiner optimizations in Tez
> 
> 1. Use a combiner with normal uses of DISTINCT. In MR Pig, there are some 
> global variables and a special DistinctCombiner class that throws away the 
> duplicate tuples. We could hack this into Pig-on-Tez, but instead I just 
> reused the reduce plan as the combiner plan, which does the same thing 
> (through a POPackage->POProject->POForEach with the setDistinct property set 
> to true).
> 
> I'm a little bit concerned that this combiner plan could somehow be slower 
> than the special DistinctCombiner class, but I don't see how.
> 
> There is also a special CombinerPackager packager that I did NOT use for 
> this. I think that packager is really intended for use with the algebraic UDF 
> combiner optimizations only.
> 
> 2. I carefully verified that DISTINCT nested inside a FOREACH code block is 
> optimized by the CombinerOptimizer into an algebraic UDF version of DISTINCT. 
> I added TestTezCompiler and e2e tests for this. Cheolsoo already made all the 
> combiner changes for this to work correctly - I didn't make any code changes 
> here.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
> a7de3a7 
>   test/e2e/pig/tests/tez.conf 71cdcbc 
>   test/org/apache/pig/test/data/GoldenFiles/TEZC13.gld PRE-CREATION 
>   test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld 35d9313 
>   test/org/apache/pig/tez/TestTezCompiler.java 2252531 
> 
> Diff: https://reviews.apache.org/r/16717/diff/
> 
> 
> Testing
> -------
> 
> Updated golden file for existing TestTezCompiler DISTINCT test to include 
> combiner plan
> Added TestTezCompiler test and golden file for DISTINCT algebraic udf combiner
> Added e2e test that runs DISTINCT with algebraic udf combiner
> I am getting some test-e2e-tez failures in ORDER BY tests, but I am also 
> getting these in a clean Tez branch. My new e2e test passes.
> 
> 
> Thanks,
> 
> Alex Bain
> 
>

Reply via email to