> On Jan. 16, 2016, 2 a.m., Mohit Sabharwal wrote: > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java, > > line 139 > > <https://reviews.apache.org/r/40743/diff/3/?file=1156747#file1156747line139> > > > > Let's please not make any changes in core-Pig unless absolutely > > necessary. This will make merging harder. Any reason this is unavoidable ? > > > > Same for removal of getFlatStr() below.
Not absolutely necessary. It just causes duplication of code. I will copy-paste the method to the child-class (POReduceBySpark) and will revert changes to core. > On Jan. 16, 2016, 2 a.m., Mohit Sabharwal wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java, > > line 234 > > <https://reviews.apache.org/r/40743/diff/3/?file=1156749#file1156749line234> > > > > LOG.debug Just keeping it consistent with rest of the explains in the method. Explain outputs to System.out > On Jan. 16, 2016, 2 a.m., Mohit Sabharwal wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java, > > line 177 > > <https://reviews.apache.org/r/40743/diff/3/?file=1156750#file1156750line177> > > > > any particular reason you're moving this to it's own file ? Yes. This is being used by 2 classes now (GlobalRearrangeConverter and ReduceByConverter). Hence moved it up. > On Jan. 16, 2016, 2 a.m., Mohit Sabharwal wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java, > > line 253 > > <https://reviews.apache.org/r/40743/diff/3/?file=1156749#file1156749line253> > > > > Don't remember if this needs to be earlier. Probably safer to keep this > > right after the combiner optimization. Done. Moved it up. - Pallavi ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40743/#review111113 ----------------------------------------------------------- On Dec. 18, 2015, 6:47 a.m., Pallavi Rao wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40743/ > ----------------------------------------------------------- > > (Updated Dec. 18, 2015, 6:47 a.m.) > > > Review request for pig, Mohit Sabharwal and Xuefu Zhang. > > > Bugs: PIG-4709 > https://issues.apache.org/jira/browse/PIG-4709 > > > Repository: pig-git > > > Description > ------- > > Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the > grouped data is consumed by subsequent operations to perform algebraic > operations, this is sub-optimal as there is lot of shuffle traffic. > The Spark Plan must be optimized to use reduceBy, where possible, so that a > combiner is used. > > Introduced a combiner optimizer that does the following: > // Checks for algebraic operations and if they exist. > // Replaces global rearrange (cogroup) with reduceBy as follows: > // Input: > // foreach (using algebraicOp) > // -> packager > // -> globalRearrange > // -> localRearrange > // Output: > // foreach (using algebraicOp.Final) > // -> reduceBy (uses algebraicOp.Intermediate) > // -> foreach (using algebraicOp.Initial) > // -> localRearrange > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java > f8c1658 > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupHIIForEach.java > aca347d > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java > a4dbadd > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java > 5f74992 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LocalRearrangeConverter.java > 9ce0492 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PigSecondaryKeyComparatorSpark.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/operator/POReduceBySpark.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java > 6b66ca1 > > src/org/apache/pig/backend/hadoop/executionengine/util/SecondaryKeyOptimizerUtil.java > 546d91e > test/org/apache/pig/test/TestCombiner.java df44293 > > Diff: https://reviews.apache.org/r/40743/diff/ > > > Testing > ------- > > The patch unblocked one UT in TestCombiner. Added another UT in the same > class. Also did some manual testing. > > > Thanks, > > Pallavi Rao > >