-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16309/#review30848
-----------------------------------------------------------

Ship it!


Looks good to me. I will commit it after running tests.

Let me also fix my minor comments below when committing the patch as well as 
whitespaces. Thank you Alex!


src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
<https://reviews.apache.org/r/16309/#comment59052>

    Let's use Boolean.valueOf() instead of String.equals() here.



src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java
<https://reviews.apache.org/r/16309/#comment59051>

    Apache header is missing.


- Cheolsoo Park


On Dec. 24, 2013, 1:34 a.m., Alex Bain wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16309/
> -----------------------------------------------------------
> 
> (Updated Dec. 24, 2013, 1:34 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini 
> Palaniswamy.
> 
> 
> Bugs: PIG-3629
>     https://issues.apache.org/jira/browse/PIG-3629
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Implement STREAM operator in Tez - 
> https://issues.apache.org/jira/browse/PIG-3629
> 
> In this patch, I do not add resources to pig-misc.jar, I just add them 
> individually. See my discussion post: 
> https://groups.google.com/forum/#!topic/pig-on-tez/8S80GMKhMaU
> 
> Basic Changes:
> -Run the PhyPlanSetter and EndOfAllInputSetter to set the parent plan and the 
> end-of-all input flags necessary for STREAM, just like in MR Pig.
> -Add a map to hold plan-specific extra local resources in TezOperPlan.java. 
> These resources can either come from the user's directory (e.g. 
> SHIP('/home/abain/foo')) or from HDFS (e.g. CACHE('/user/abain/bar') in HDFS).
> -Add the new class TezPOStreamVisitor that assembles all the plan-specific 
> local resources that get added in TezOperPlan.java.
> 
> Resource Manager Changes:
> -TezResourcManager resources were previously a map of java.net.URL -> Path in 
> HDFS. Previously, the URL's were all local files, e.g. 
> file://home/abain/pig-withouthHadoop.jar. However, the CACHE statement 
> requires that resources already present in HDFS be able to be added as local 
> resources. Unfortunately java.net.URL does not support hdfs:// URL's, so I 
> changed this data structure to be a YARN URL instead. I also added methods to 
> the ResourceManager to distinguish whether you are adding a local resource or 
> a resource already present in HDFS.
> -CACHE also supports URL's with fragments at the end, which become a 
> "shortcut" to the name, e.g. CACHE(/input/big-data-name.gz#data.gz). I 
> changed the resource manager to look for a fragments and use that as the 
> resource name (if the fragment exist). This results in the symlink to the 
> resource being created with the fragment name, which is what we want.
> 
> Race condition:
> -I found a race condition that resulted from reusing the Result object in 
> POSimpleTezLoad. There are several possible solutions. After discussing in 
> the newsgroup, we decided to change POSimpleTezLoad for now.
> -I also made a small cleanup to PhysicalOperator.java.
> 
> 
> Diffs
> -----
> 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java
>  37566ab 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
>  8487a0f 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POSimpleTezLoad.java 
> d57aded 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 
> 0ee7256 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
> 191563d 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
> df9fea6 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java
>  135b933 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java 
> 0cc8e17 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java 
> 673fd70 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java 
> 0fd7575 
>   src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java 
> 862e637 
>   test/e2e/pig/tests/tez.conf 0e4ba4e 
>   test/org/apache/pig/test/data/GoldenFiles/TEZC12.gld PRE-CREATION 
>   test/org/apache/pig/tez/TestTezCompiler.java 8d5e5f2 
> 
> Diff: https://reviews.apache.org/r/16309/diff/
> 
> 
> Testing
> -------
> 
> Added a unit test to TestTezCompiler.java
> Added a new unit test e2e test to tez.conf with session reuse enabled
> Ported three other e2e tests from streaming.conf to tez.conf to increase 
> coverage
> 
> ant test-tez passes
> ant test-e2e-tez passes
> Manually tested with a large subset of tests from streaming.conf (the ones 
> using features currently supported by Pig-on-Tez), they pass
> 
> 
> Thanks,
> 
> Alex Bain
> 
>

Reply via email to