----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55681/ -----------------------------------------------------------
(Updated Jan. 21, 2017, 12:35 a.m.)
Review request for pig, Daniel Dai and Adam Szita.
Changes
-------
Fixed the test failure the right way. PigGraceShuffleVertexManager should not
estimate parallelism for the bloom reducer at all. It should remain same as
number of bloom filters. Previously it could have ended up increasing the
parallelism to a bigger number than number of bloom filters based on the join
input size.
Also moved the isIntermediateReducer() cehck calls down to avoid unnecessarily
traversing the plan.
Bugs: PIG-4963
https://issues.apache.org/jira/browse/PIG-4963
Repository: pig
Description
-------
This patch adds a new type of join called bloom. It supports creating multiple
bloom filters partitioned by hashcode of key for parallelism. Two new operators
and one Packager implementations are added.
POBuildBloomRearrageTez - Builds the bloom filter for one of the relations
of the join on the map side or writes out the join keys based on the strategy
BloomPackager - Used in the reducer to create or combine bloom filters and
produces the final bloom filters.
POBloomFilterRearrangeTez - Applies the bloom filters to other relations in
the join and filters out data.
More details in the documentation.
Diffs (updated)
-----
http://svn.apache.org/repos/asf/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/docs/src/documentation/content/xdocs/perf.xml
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigConfiguration.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigCombiner.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezEdgeDescriptor.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezOperator.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezPOPackageAnnotator.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/BloomPackager.java
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POBloomFilterRearrangeTez.java
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POBuildBloomRearrangeTez.java
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POShuffleTezLoad.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/CombinerOptimizer.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/ParallelismSetter.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/SecondaryKeyOptimizerTez.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/TezEstimatedParallelismClearer.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/TezOperDependencyParallelismEstimator.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOJoin.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/ScriptState.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/tez/TezScriptState.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/build.xml 1779665
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/join.conf
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/multiquery.conf
1779665
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/orc.conf 1779665
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEmptyInputDir.java
1779665
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-1-KeyToReducer.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-1.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-2-KeyToReducer.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-2.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-3-KeyToReducer.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-3.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-4-KeyToReducer.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-4.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-5-KeyToReducer.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-5.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-6-KeyToReducer.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-6.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-7-KeyToReducer.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-7.gld
PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java
1779665
Diff: https://reviews.apache.org/r/55681/diff/
Testing
-------
Unit and e2e tests added for many different scenarios.
Thanks,
Rohini Palaniswamy
