----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16309/#review30848 -----------------------------------------------------------
Ship it! Looks good to me. I will commit it after running tests. Let me also fix my minor comments below when committing the patch as well as whitespaces. Thank you Alex! src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java <https://reviews.apache.org/r/16309/#comment59052> Let's use Boolean.valueOf() instead of String.equals() here. src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java <https://reviews.apache.org/r/16309/#comment59051> Apache header is missing. - Cheolsoo Park On Dec. 24, 2013, 1:34 a.m., Alex Bain wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16309/ > ----------------------------------------------------------- > > (Updated Dec. 24, 2013, 1:34 a.m.) > > > Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3629 > https://issues.apache.org/jira/browse/PIG-3629 > > > Repository: pig-git > > > Description > ------- > > Implement STREAM operator in Tez - > https://issues.apache.org/jira/browse/PIG-3629 > > In this patch, I do not add resources to pig-misc.jar, I just add them > individually. See my discussion post: > https://groups.google.com/forum/#!topic/pig-on-tez/8S80GMKhMaU > > Basic Changes: > -Run the PhyPlanSetter and EndOfAllInputSetter to set the parent plan and the > end-of-all input flags necessary for STREAM, just like in MR Pig. > -Add a map to hold plan-specific extra local resources in TezOperPlan.java. > These resources can either come from the user's directory (e.g. > SHIP('/home/abain/foo')) or from HDFS (e.g. CACHE('/user/abain/bar') in HDFS). > -Add the new class TezPOStreamVisitor that assembles all the plan-specific > local resources that get added in TezOperPlan.java. > > Resource Manager Changes: > -TezResourcManager resources were previously a map of java.net.URL -> Path in > HDFS. Previously, the URL's were all local files, e.g. > file://home/abain/pig-withouthHadoop.jar. However, the CACHE statement > requires that resources already present in HDFS be able to be added as local > resources. Unfortunately java.net.URL does not support hdfs:// URL's, so I > changed this data structure to be a YARN URL instead. I also added methods to > the ResourceManager to distinguish whether you are adding a local resource or > a resource already present in HDFS. > -CACHE also supports URL's with fragments at the end, which become a > "shortcut" to the name, e.g. CACHE(/input/big-data-name.gz#data.gz). I > changed the resource manager to look for a fragments and use that as the > resource name (if the fragment exist). This results in the symlink to the > resource being created with the fragment name, which is what we want. > > Race condition: > -I found a race condition that resulted from reusing the Result object in > POSimpleTezLoad. There are several possible solutions. After discussing in > the newsgroup, we decided to change POSimpleTezLoad for now. > -I also made a small cleanup to PhysicalOperator.java. > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java > 37566ab > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java > 8487a0f > src/org/apache/pig/backend/hadoop/executionengine/tez/POSimpleTezLoad.java > d57aded > src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java > 0ee7256 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 191563d > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > df9fea6 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java > 135b933 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java > 0cc8e17 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java > 673fd70 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java > 0fd7575 > src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java > 862e637 > test/e2e/pig/tests/tez.conf 0e4ba4e > test/org/apache/pig/test/data/GoldenFiles/TEZC12.gld PRE-CREATION > test/org/apache/pig/tez/TestTezCompiler.java 8d5e5f2 > > Diff: https://reviews.apache.org/r/16309/diff/ > > > Testing > ------- > > Added a unit test to TestTezCompiler.java > Added a new unit test e2e test to tez.conf with session reuse enabled > Ported three other e2e tests from streaming.conf to tez.conf to increase > coverage > > ant test-tez passes > ant test-e2e-tez passes > Manually tested with a large subset of tests from streaming.conf (the ones > using features currently supported by Pig-on-Tez), they pass > > > Thanks, > > Alex Bain > >