-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
-----------------------------------------------------------
(Updated Sept. 18, 2012, 5:43 p.m.)
Review request for hive.
Changes
-------
bug fix+ 3 test cases
Description
-------
This optimizer exploits intra-query correlations and merges multiple correlated
MapReduce jobs into one jobs. Open a new request since I have been working on
hive-git.
This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663
ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6
ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051
ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 05a399d
ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141
ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630
ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 6bc5fe4
ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131
ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 63e8ff2
ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125
ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd
ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040
ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION
ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION
ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION
ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION
ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION
ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION
ql/src/test/results/compiler/plan/groupby1.q.xml 4382252
ql/src/test/results/compiler/plan/groupby2.q.xml eef669c
ql/src/test/results/compiler/plan/groupby3.q.xml 9743480
ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860
Diff: https://reviews.apache.org/r/7126/diff/
Testing
-------
Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver,
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore,
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient,
testSynchronized in TestSetUGIOnOnlyServer, and
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver. This
patch should pass all other tests.
When the optimizer is enabled (right now, the optimizer is disabled by
default), there are several cases failed. 1 is optimized by the optimizer. 1 is
not suitable for this correlation optimizer. 2 are due to potential bugs of the
trunk. Other failures are parsing cases (xml plans). Those failures are due to
my minor changes in SemanticAnalyzer since several redundant operators will be
generated for the correlation optimizer. Overall, those failures are not very
relevant to the patch. Please see
https://issues.apache.org/jira/browse/HIVE-2206?focusedCommentId=13456171&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13456171
for details.
Thanks,
Yin Huai