----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40743/ -----------------------------------------------------------
(Updated Jan. 27, 2016, 9:25 a.m.) Review request for pig, Mohit Sabharwal and Xuefu Zhang. Changes ------- Rebased and addressed review comments Bugs: PIG-4709 https://issues.apache.org/jira/browse/PIG-4709 Repository: pig-git Description ------- Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the grouped data is consumed by subsequent operations to perform algebraic operations, this is sub-optimal as there is lot of shuffle traffic. The Spark Plan must be optimized to use reduceBy, where possible, so that a combiner is used. Introduced a combiner optimizer that does the following: // Checks for algebraic operations and if they exist. // Replaces global rearrange (cogroup) with reduceBy as follows: // Input: // foreach (using algebraicOp) // -> packager // -> globalRearrange // -> localRearrange // Output: // foreach (using algebraicOp.Final) // -> reduceBy (uses algebraicOp.Intermediate) // -> foreach (using algebraicOp.Initial) // -> localRearrange Diffs (updated) ----- src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 4e7bf00 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java 5f74992 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LocalRearrangeConverter.java 9ce0492 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PigSecondaryKeyComparatorSpark.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/operator/POReduceBySpark.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java 6b66ca1 src/org/apache/pig/backend/hadoop/executionengine/util/SecondaryKeyOptimizerUtil.java 546d91e test/org/apache/pig/test/TestCombiner.java df44293 Diff: https://reviews.apache.org/r/40743/diff/ Testing ------- The patch unblocked one UT in TestCombiner. Added another UT in the same class. Also did some manual testing. Thanks, Pallavi Rao