> On March 31, 2014, 5:22 p.m., Cheolsoo Park wrote: > > Hi Rohini, you gave me an aha moment with this patch. It would be nice if > > TEZ-1003 could be fixed before Tez 0.4.0 release. > > > > The approach that you're taking looks great! I had few minor comments while > > reading through your patch.
Thanks :). Just took the idea from MultiQueryOptimizer Daniel did as could not manage that within TezCompiler itself. And it was painful debugging and getting it to work :). I might not have handled some stuff and broken somethings with this patch. Will try to run the full suite of e2e after TEZ-1003 is fixed. I don't think we have test cases for union followed by groupby, join, etc and think we need to add new e2e tests. Created a separate jira (PIG-3855) for that. Union followed by Limit did look too complex, but my brain was totally fried it could not process any more. I think we need to take a look at optimizing Limit as well. Created PIG-3854 for that. > On March 31, 2014, 5:22 p.m., Cheolsoo Park wrote: > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, > > lines 578-598 > > <https://reviews.apache.org/r/19836/diff/1/?file=542223#file542223line578> > > > > This can be simplified as follows- > > > > if (tezOp.getVertexGroupStores() != null) { > > if > > (tezOp.getVertexGroupStores().containsKey(store.getOperatorKey())) { > > continue; > > } > > } > > vertex.addOutput(store.getOperatorKey().toString(), > > storeOutDescriptor, MROutputCommitter.class); Good suggestion. Had the key as the vertex group and value as store for some other reason. That is not used anymore. So will switch the key and value as it simplifies code. > On March 31, 2014, 5:22 p.m., Cheolsoo Park wrote: > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/optimizers/UnionOptimizer.java, > > line 174 > > <https://reviews.apache.org/r/19836/diff/1/?file=542237#file542237line174> > > > > Why do we need this conversion: List => ArrayList? Since we are iterating over it and also disconnecting it from union which removes it from the succs list we need to copy over to a new ArrayList to avoid ConcurrentModificationException. - Rohini ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19836/#review39048 ----------------------------------------------------------- On March 31, 2014, 9:18 a.m., Rohini Palaniswamy wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19836/ > ----------------------------------------------------------- > > (Updated March 31, 2014, 9:18 a.m.) > > > Review request for pig, Cheolsoo Park and Daniel Dai. > > > Bugs: PIG-3835 > https://issues.apache.org/jira/browse/PIG-3835 > > > Repository: pig > > > Description > ------- > > Changes done: > Changed POLocalRearrangeTez to POValueOutputTez > Wrote a UnionOptimizer > Got union followed by store working using vertexgroup. > Also implemented the case where union followed by group by or join only has 3 > vertices instead of 4 using vertexgroup. But that is not working because > ConcatenatedMergedKeyValuesInput is not working as expected. It does not > group values from the two input together. The values are grouped only within > each input. Will file a Tez bug on that. > Putting up the patch to get approach vetted. Still to run unit and e2e tests > and some minor cleanup pending. > > > Diffs > ----- > > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/PigConfiguration.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLimit.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/MultiQueryOptimizerTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POSimpleTezLoad.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POStoreTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POValueInputTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POValueOutputTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/ReadScalarsTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezInput.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLoad.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOutput.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOPackageAnnotator.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterStatsTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/PORankTez.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POShuffledValueInputTez.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/optimizers/UnionOptimizer.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/tools/pigstats/tez/TezStats.java > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC-Union-1.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC-Union-2.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC-Union-3.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC-Union-4.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC-Union-5.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC-Union-6.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC-Union-7.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC17.gld > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC21.gld > 1583081 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java > 1583081 > > Diff: https://reviews.apache.org/r/19836/diff/ > > > Testing > ------- > > > Thanks, > > Rohini Palaniswamy > >