----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16860/ -----------------------------------------------------------
(Updated Jan. 17, 2014, 12:49 a.m.) Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy. Changes ------- Per Rohini's request, I am uploading the final patch that I committed to tez branch. Bugs: PIG-3644 https://issues.apache.org/jira/browse/PIG-3644 Repository: pig-git Description ------- Skewed join in Tez is implemented in 5 vertices: Vertex 1) Sample/load skewed table => broadcast sampling input to vertex 2 and shuffle entire input to vertex 3. Vertex 2) Sampling aggregation vertex => build distribution map and broadcast it to vertex 3 and 4. Vertex 3) POLocalRearrangeTez for skewed table => partition skewed table using SkewedPartitioner and shuffle it to vertex 5. Vertex 4) POPartitionRearrangeTez for streaming table => shuffle streaming table to vertex 5. Vertex 5) Join inputs from vertex 3 and 4. New classes for Tez: - POPoissonSample) Sampling operator for skewed join. - POPartitionRearrangeTez) Sub-class of POPartitionRearrange for Tez. - SkewedPartitionerTez) Sub-class of SkewedPartitioner for Tez. Note that there are a couple of places I can refactor. For eg, - POPoissonSample and PoissonSampleLoader - POPartitionRearrageTez and POLocalRearrangeTez I will do it in follow-up jiras. Diffs (updated) ----- src/org/apache/pig/PigConfiguration.java ccf3635 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/SkewedPartitioner.java 4790abe src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPoissonSample.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POReservoirSample.java bcb339c src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 585509d src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java e9d8e64 src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java e22c319 src/org/apache/pig/backend/hadoop/executionengine/tez/SkewedPartitionerTez.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 632eae5 src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 53b255e src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java 93e522f src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java 7bcc79e src/org/apache/pig/impl/builtin/PartitionSkewedKeys.java 7ce0e82 src/org/apache/pig/impl/builtin/PoissonSampleLoader.java 5ce5b9e test/e2e/pig/tests/tez.conf ac254e5 Diff: https://reviews.apache.org/r/16860/diff/ Testing ------- - Added e2e test cases for inner and outer skewed joins. - unit tests pass. - e2e tests pass. Thanks, Cheolsoo Park