subject:"\[jira\] \[Updated\] \(HIVE\-2206\) add a new optimizer for query correlation discovery and optimization"

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2014-07-14 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-2206:
-

Labels:   (was: TODOC12)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, 
 HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, 
 HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, 
 testQueries.2.q


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2014-07-13 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-2206:
-

Labels: TODOC12  (was: )

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
  Labels: TODOC12
 Fix For: 0.12.0

 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, 
 HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, 
 HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, 
 testQueries.2.q


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-08-27 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.21.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- Merge remote-tracking branch 'upstream/trunk' into trunk
- Merge remote-tracking branch 'upstream/trunk' into trunk
- Merge remote-tracking branch 'upstream/trunk' into trunk
- HIVE-5149 [jira] ReduceSinkDeDuplication can pick the wrong partitioning 
columns

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35907id=39087#toc

BRANCH
  HIVE-5149

ARCANIST PROJECT
  hive

AFFECTED FILES
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out
  ql/src/test/results/clientpositive/groupby_cube1.q.out
  ql/src/test/results/clientpositive/groupby_rollup1.q.out
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out

To: JIRA, ashutoshc, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.2.patch, 
 HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, 
 HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, 
 HIVE-2206.D11097.9.patch, HIVE-2206.patch, testQueries.2.q, 
 YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-08-27 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.22.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  HIVE-5149

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=39087id=39099#toc

BRANCH
  trunk

ARCANIST PROJECT
  hive

AFFECTED FILES
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out
  ql/src/test/results/clientpositive/groupby_cube1.q.out
  ql/src/test/results/clientpositive/groupby_rollup1.q.out
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out

To: JIRA, ashutoshc, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, 
 HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, 
 HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-19 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.20.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- HIVE-4877 [jira] In ExecReducer, remove tag from the row which will be 
passed to the first Operator at the Reduce-side

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35841id=35907#toc

BRANCH
  HIVE-4877

ARCANIST PROJECT
  hive

AFFECTED FILES
  data/files/kv1kv2.cogroup.txt
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java

To: JIRA, ashutoshc, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, 
 HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, 
 HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-2206:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Yin!

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, 
 HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, 
 HIVE-2206.patch, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-17 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: In Progress  (was: Patch Available)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.1.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, 
 HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, 
 HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, testQueries.2.q, 
 YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-17 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.19.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- address Ashutosh's comments

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35721id=35841#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer12.q
  ql/src/test/queries/clientpositive/correlationoptimizer13.q
  ql/src/test/queries/clientpositive/correlationoptimizer14.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer12.q.out
  ql/src/test/results/clientpositive/correlationoptimizer13.q.out
  ql/src/test/results/clientpositive/correlationoptimizer14.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, ashutoshc, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-17 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.patch

try to trigger the precommit test.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, 
 HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, 
 HIVE-2206.patch, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-14 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.18.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- Merge remote-tracking branch 'upstream/trunk' into HIVE-2206-3671-20130711
- Left semi join should be handled in 
analyzeReduceSinkOperatorsOfJoinOperator. Also, use instanceof instead of using 
the operator's name to check the type of an Operator.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35661id=35721#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer12.q
  ql/src/test/queries/clientpositive/correlationoptimizer13.q
  ql/src/test/queries/clientpositive/correlationoptimizer14.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer12.q.out
  ql/src/test/results/clientpositive/correlationoptimizer13.q.out
  ql/src/test/results/clientpositive/correlationoptimizer14.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-12 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.17.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- Since hive already uses a single scan for a table with multiple aliases 
in a MR job, we can remove unnecessary code on merging TableScanOperators

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35487id=35661#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer12.q
  ql/src/test/queries/clientpositive/correlationoptimizer13.q
  ql/src/test/queries/clientpositive/correlationoptimizer14.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer12.q.out
  ql/src/test/results/clientpositive/correlationoptimizer13.q.out
  ql/src/test/results/clientpositive/correlationoptimizer14.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-08 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.16.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- update test cases
- refactor code

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35283id=35487#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer12.q
  ql/src/test/queries/clientpositive/correlationoptimizer13.q
  ql/src/test/queries/clientpositive/correlationoptimizer14.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer12.q.out
  ql/src/test/results/clientpositive/correlationoptimizer13.q.out
  ql/src/test/results/clientpositive/correlationoptimizer14.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-02 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.15.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- Correlation optimizer currently does not support PTF operator

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35265id=35283#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer12.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer12.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-01 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.14.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Improvement
- update test cases
- refactoring.
- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Improvement

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35223id=35265#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-28 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.13.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  Handle partitioned tables

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35193id=35223#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeConstantDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-26 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.11.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  I tested all unit tests before the commit of HIVE-4496. all unit tests pass

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35055id=35181#toc

AFFECTED FILES
  build-common.xml
  data/files/leftsemijoin_mr_t1.txt
  data/files/leftsemijoin_mr_t2.txt
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  ql/src/test/queries/clientpositive/leftsemijoin_mr.q
  ql/src/test/results/clientpositive/leftsemijoin_mr.q.out

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, 
 HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-26 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.12.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  My last diff was for 4718...

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35181id=35193#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-21 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.10.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- fix bugs related to (1) semi join and (2) multiple join and the 
DemuxOperator directly connects to a MuxOperator
- update

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34869id=35055#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-18 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.9.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

a few minor updates
- add new tests
- remove unused methods
- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Improvement
- add Apache license header
- evaluate if this operator is UnionOperator first

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34791id=34869#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-15 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.8.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

All tests pass
- new tests
- Fix a bug related to UnionOperator;
- wip
- more comments in tests
- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Improvement
- update comments
- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Improvement

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34653id=34791#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt,

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-12 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.6.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- fix the mechanism of processing a key group
- bug fix
- refactoring code. make output of optimized plans deterministic. add new
- use spaces for indentation.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34413id=34623#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-12 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.7.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Improvement
- make optimized plans deterministic
- add a new test and update results

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34623id=34653#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, 
 HIVE-2206.D11097.7.patch, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-07 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.4.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  address brock's comments

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34383id=34401#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-07 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.5.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- check isTraceEnabled before constructing a trace message

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34401id=34413#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, testQueries.2.q, 
 YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-06 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.2.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Refactoring
- Merge branch 'HIVE-2206-3671-Refactoring' of 
https://github.com/yhuai/hive into HIVE-2206-3671-Refactoring
- end group is not called correctly
- reorganize testcases
- Merge remote-tracking branch 'upstream/trunk' into 
HIVE-2206-3671-Refactoring
- Bug fix. The logic of comparing if a table is not good to be a big table
- update results

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34293id=34329#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, testQueries.2.q, 
 YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-06 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Status: Patch Available (was: In Progress)

HIVE-2206.D11097.3.patch is the latest patch. I have fixed bugs found in unit
tests when the optimizer is enabled by default. The patch is ready for review.

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,
HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt,
HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt,
HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt,
HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt,
HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt,
HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch,
testQueries.2.q, YSmartPatchForHive.patch

This issue proposes a new logical optimizer called Correlation Optimizer,
which is used to merge correlated MapReduce jobs (MR jobs) into a single MR
job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The
paper and slides of YSmart are linked at the bottom.
Since Hive translates queries in a sentence by sentence fashion, for every
operation which may need to shuffle the data (e.g. join and aggregation
operations), Hive will generate a MapReduce job for that operation. However,
for those operations which may need to shuffle the data, they may involve
correlations explained below and thus can be executed in a single MR job.
# Input Correlation: Multiple MR jobs have input correlation (IC) if their
input relation sets are not disjoint;
# Transit Correlation: Multiple MR jobs have transit correlation (TC) if they
have not only input correlation, but also the same partition key;
# Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its
child nodes if it has the same partition key as that child node.
The current implementation of correlation optimizer only detect correlations
among MR jobs for reduce-side join operators and reduce-side aggregation
operators (not map only aggregation). A query will be optimized if it
satisfies following conditions.
# There exists a MR job for reduce-side join operator or reduce side
aggregation operator which have JFC with all of its parents MR jobs (TCs will
be also exploited if JFC exists);
# All input tables of those correlated MR job are original input tables (not
intermediate tables generated by sub-queries); and
# No self join is involved in those correlated MR jobs.
Correlation optimizer is implemented as a logical optimizer. The main reasons
are that it only needs to manipulate the query plan tree and it can leverage
the existing component on generating MR jobs.
Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.
There are several work that can be done in future to improve this optimizer.
Here are three examples.
# Support queries only involve TC;
# Support queries in which input tables of correlated MR jobs involves
intermediate tables; and
# Optimize queries involving self join.
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-06 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.3.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  Have fixed bugs I found from unit tests when the optimizer is enabled by 
default.
  Also, refactored the code and updated test results.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=34329id=34383#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-04 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.1.patch

yhuai requested code review of HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

Reviewers: JIRA

update test results

This issue proposes a new logical optimizer called Correlation Optimizer, which 
is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The 
idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and 
slides of YSmart are linked at the bottom.

Since Hive translates queries in a sentence by sentence fashion, for every 
operation which may need to shuffle the data (e.g. join and aggregation 
operations), Hive will generate a MapReduce job for that operation. However, 
for those operations which may need to shuffle the data, they may involve 
correlations explained below and thus can be executed in a single MR job.

Input Correlation: Multiple MR jobs have input correlation (IC) if 
their input relation sets are not disjoint;
Transit Correlation: Multiple MR jobs have transit correlation (TC) if 
they have not only input correlation, but also the same partition key;
Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of 
its child nodes if it has the same partition key as that child node.

The current implementation of correlation optimizer only detect correlations 
among MR jobs for reduce-side join operators and reduce-side aggregation 
operators (not map only aggregation). A query will be optimized if it satisfies 
following conditions.

There exists a MR job for reduce-side join operator or reduce side 
aggregation operator which have JFC with all of its parents MR jobs (TCs will 
be also exploited if JFC exists);
All input tables of those correlated MR job are original input tables 
(not intermediate tables generated by sub-queries); and
No self join is involved in those correlated MR jobs.

Correlation optimizer is implemented as a logical optimizer. The main reasons 
are that it only needs to manipulate the query plan tree and it can leverage 
the existing component on generating MR jobs.

Current implementation can serve as a framework for correlation related 
optimizations. I think that it is better than adding individual optimizers.

There are several work that can be done in future to improve this optimizer. 
Here are three examples.

Support queries only involve TC;
Support queries in which input tables of correlated MR jobs involves 
intermediate tables; and
Optimize queries involving self join.

References:
Paper and presentation of YSmart.
Paper: 
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11097

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-04 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.21.patch.txt

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.21.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, HIVE-2206.D11097.1.patch, testQueries.2.q, 
 YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-04 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: (was: HIVE-2206.21.patch.txt)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.1.patch, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-03 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Affects Version/s: (was: 0.10.0)
   0.12.0
   Status: In Progress  (was: Patch Available)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-05-24 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Description:
This issue proposes a new logical optimizer called Correlation Optimizer, which
is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The
idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and
slides of YSmart are linked at the bottom.

Since Hive translates queries in a sentence by sentence fashion, for every
operation which may need to shuffle the data (e.g. join and aggregation
operations), Hive will generate a MapReduce job for that operation. However,
for those operations which may need to shuffle the data, they may involve
correlations explained below and thus can be executed in a single MR job.
# Input Correlation: Multiple MR jobs have input correlation (IC) if their
input relation sets are not disjoint;
# Transit Correlation: Multiple MR jobs have transit correlation (TC) if they
have not only input correlation, but also the same partition key;
# Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its
child nodes if it has the same partition key as that child node.

The current implementation of correlation optimizer only detect correlations
among MR jobs for reduce-side join operators and reduce-side aggregation
operators (not map only aggregation). A query will be optimized if it satisfies
following conditions.
# There exists a MR job for reduce-side join operator or reduce side
aggregation operator which have JFC with all of its parents MR jobs (TCs will
be also exploited if JFC exists);
# All input tables of those correlated MR job are original input tables (not
intermediate tables generated by sub-queries); and
# No self join is involved in those correlated MR jobs.

Correlation optimizer is implemented as a logical optimizer. The main reasons
are that it only needs to manipulate the query plan tree and it can leverage
the existing component on generating MR jobs.

Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.

There are several work that can be done in future to improve this optimizer.
Here are three examples.
# Support queries only involve TC;
# Support queries in which input tables of correlated MR jobs involves
intermediate tables; and
# Optimize queries involving self join.

References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

was:
This issue proposes a new logical optimizer called Correlation Optimizer, which
is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The
idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The paper and
slides of YSmart are linked at the bottom.

Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-16 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.20-r1434012.patch.txt

[~ashutoshc] The only thing I have not addressed is cloning the plan at the
beginning of optimization work. Seems that we only can clone a part of the
query plan tree. If we backup only a part of the query plan tree, I think it is
the same as the current patch (described below), since in this way, we still
need to link the backup part to other parts if correlation optimizer cannot
optimize the given query.

In my patch, the original plan tree will be changed to a version without
map-side aggregations, firstly. This version of the plan tree will be only
changed if there is any correlation detected and new merged Map-side plan and
reduce side dispatch have been successfully generated (this query can be
optimized). If optimizer cannot optimize this plan, it will change the plan
back to the original one.

Any suggestion on how to address this issue? Or, did I miss any thing which can
facilitate the cloning work? I think it will be great if we can clone a
ParseContext, so if a optimization is failed, we still have a copy of the
original plan.

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,
HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt,
HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt,
HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt,
HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt,
HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt,
testQueries.2.q, YSmartPatchForHive.patch

This issue proposes a new logical optimizer called Correlation Optimizer,
which is used to merge correlated MapReduce jobs (MR jobs) into a single MR
job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The
paper and slides of YSmart are linked at the bottom.
Since Hive translates queries in a sentence by sentence fashion, for every
operation which may need to shuffle the data (e.g. join and aggregation
operations), Hive will generate a MapReduce job for that operation. However,
for those operations which may need to shuffle the data, they may involve
correlations explained below and thus can be executed in a single MR job.
# Input Correlation: Multiple MR jobs have input correlation (IC) if their
input relation sets are not disjoint;
# Transit Correlation: Multiple MR jobs have transit correlation (TC) if they
have not only input correlation, but also the same partition key;
# Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its
child nodes if it has the same partition key as that child node.
The current implementation of correlation optimizer only detect correlations
among MR jobs for reduce-side join operators and reduce-side aggregation
operators (not map only aggregation). A query will be optimized if it
satisfies following conditions.
# There exists a MR job for reduce-side join operator or reduce side
aggregation operator which have JFC with all of its parents MR jobs (TCs will
be also exploited if JFC exists);
# All input tables of those correlated MR job are original input tables (not
intermediate tables generated by sub-queries); and
# No self join is involved in those correlated MR jobs.
Correlation optimizer is implemented as a logical optimizer. The main reasons
are that it only needs to manipulate the query plan tree and it can leverage
the existing component on generating MR jobs.
Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.
There are several work that can be done in future to improve this optimizer.
Here are three examples.
# Support queries only involve TC;
# Support queries in which input tables of correlated MR jobs involves
intermediate tables; and
# Optimize queries involving self join.
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

--
This message is

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-16 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: Patch Available  (was: In Progress)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-14 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: In Progress  (was: Patch Available)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.19-r1410581.patch.txt

I just integrate HIVE-3671 into this patch. At the beginning of correlation
optimizer, it will predict if a join operator will be converted by
CommonJoinResolver, if so, correlation optimizer will annotate this join
operator and in the future optimization, ignore this operator. The prediction
can only be made to those join operators the input tables of which are not
intermediate tables. The method of the prediction is ported from
CommonJoinResolver. Also, a test is added in correlationoptimizer1.q

[~namit]
Please take a look at this patch. Let me know if you have any comment.

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,
HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt,
HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt,
HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt,
HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

This issue proposes a new logical optimizer called Correlation Optimizer,
which is used to merge correlated MapReduce jobs (MR jobs) into a single MR
job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The
paper and slides of YSmart are linked at the bottom.
Since Hive translates queries in a sentence by sentence fashion, for every
operation which may need to shuffle the data (e.g. join and aggregation
operations), Hive will generate a MapReduce job for that operation. However,
for those operations which may need to shuffle the data, they may involve
correlations explained below and thus can be executed in a single MR job.
# Input Correlation: Multiple MR jobs have input correlation (IC) if their
input relation sets are not disjoint;
# Transit Correlation: Multiple MR jobs have transit correlation (TC) if they
have not only input correlation, but also the same partition key;
# Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its
child nodes if it has the same partition key as that child node.
The current implementation of correlation optimizer only detect correlations
among MR jobs for reduce-side join operators and reduce-side aggregation
operators (not map only aggregation). A query will be optimized if it
satisfies following conditions.
# There exists a MR job for reduce-side join operator or reduce side
aggregation operator which have JFC with all of its parents MR jobs (TCs will
be also exploited if JFC exists);
# All input tables of those correlated MR job are original input tables (not
intermediate tables generated by sub-queries); and
# No self join is involved in those correlated MR jobs.
Correlation optimizer is implemented as a logical optimizer. The main reasons
are that it only needs to manipulate the query plan tree and it can leverage
the existing component on generating MR jobs.
Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.
There are several work that can be done in future to improve this optimizer.
Here are three examples.
# Support queries only involve TC;
# Support queries in which input tables of correlated MR jobs involves
intermediate tables; and
# Optimize queries involving self join.
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-12 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.18-r1407720.patch.txt

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-02 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Description: 
References:
Paper and presentation of YSmart.
Paper: 
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Presentation: http://sdrv.ms/UpwJJc

  was:
reference:

http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Presentation: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-02 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Description:
This issue proposes a new optimizer called Correlation Optimizer, which is used
to merge correlated operations into a single MapReduce job. In current
implementation, correlation optimizer will only try to merge join and
aggregation operators. Will add more later...

The idea is based on YSmart, an SQL-to-MapReduce translator
(http://ysmart.cse.ohio-state.edu/). The paper and slides are also linked
below.

References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

was:
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Presentation: http://sdrv.ms/UpwJJc

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt,
HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt,
HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

This issue proposes a new optimizer called Correlation Optimizer, which is
used to merge correlated operations into a single MapReduce job. In current
implementation, correlation optimizer will only try to merge join and
aggregation operators. Will add more later...
The idea is based on YSmart, an SQL-to-MapReduce translator
(http://ysmart.cse.ohio-state.edu/). The paper and slides are also linked
below.
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-02 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Description:
This issue proposes a new logical optimizer called Correlation Optimizer, which
is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The
idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The paper and
slides of YSmart are linked at the bottom.

Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.

References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

was:
This issue proposes a new optimizer called Correlation Optimizer, which is used
to merge correlated operations into a single MapReduce job. In current
implementation, correlation optimizer will only try to merge join and
aggregation operators. Will add more later...

The idea is based on YSmart, an SQL-to-MapReduce translator
(http://ysmart.cse.ohio-state.edu/). The paper and slides are also linked
below.

References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt,
HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt,
HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-02 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.17-r1404933.patch.txt

update a new patch which can be applied to r1404933. Also added the description
of this issue.

However, I do not have the permission to add a page in wiki. Where should I
request the permission?

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.1.patch.txt,
HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt,
HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt,
HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

This issue proposes a new logical optimizer called Correlation Optimizer,
which is used to merge correlated MapReduce jobs (MR jobs) into a single MR
job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The
paper and slides of YSmart are linked at the bottom.
Since Hive translates queries in a sentence by sentence fashion, for every
operation which may need to shuffle the data (e.g. join and aggregation
operations), Hive will generate a MapReduce job for that operation. However,
for those operations which may need to shuffle the data, they may involve
correlations explained below and thus can be executed in a single MR job.
# Input Correlation: Multiple MR jobs have input correlation (IC) if their
input relation sets are not disjoint;
# Transit Correlation: Multiple MR jobs have transit correlation (TC) if they
have not only input correlation, but also the same partition key;
# Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its
child nodes if it has the same partition key as that child node.
The current implementation of correlation optimizer only detect correlations
among MR jobs for reduce-side join operators and reduce-side aggregation
operators (not map only aggregation). A query will be optimized if it
satisfies following conditions.
# There exists a MR job for reduce-side join operator or reduce side
aggregation operator which have JFC with all of its parents MR jobs (TCs will
be also exploited if JFC exists);
# All input tables of those correlated MR job are original input tables (not
intermediate tables generated by sub-queries); and
# No self join is involved in those correlated MR jobs.
Correlation optimizer is implemented as a logical optimizer. The main reasons
are that it only needs to manipulate the query plan tree and it can leverage
the existing component on generating MR jobs.
Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.
There are several work that can be done in future to improve this optimizer.
Here are three examples.
# Support queries only involve TC;
# Support queries in which input tables of correlated MR jobs involves
intermediate tables; and
# Optimize queries involving self join.
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-10-19 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.16-r1399936.patch.txt

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-10-02 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.15-r1392491.patch.txt

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-10-02 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: Patch Available  (was: Reopened)

@Namit:
You can review the latest patch. I removed the first phase and other 
unnecessary contents. 

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-30 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2206:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed. Thanks for the hard work, Yin Huai!

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-26 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: Patch Available  (was: Open)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-26 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.14-r1389704.patch.txt

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-24 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.13-r1389072.patch.txt

two new tests + bug fix. This patch is ready to review. Diff r4 in 
https://reviews.apache.org/r/7126/ is the latest patch. 

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-24 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: Patch Available  (was: Open)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-24 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2206:
-

Status: Open  (was: Patch Available)

@Yin: Please see my comments on reviewboard. Thanks.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-20 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Component/s: Query Processor

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-20 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Affects Version/s: 0.10.0

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-18 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.12-r1386996.patch.txt

patch updated. bug fix+ 3 test cases

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-18 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: (was: HIVE-2206.12-r1386996.patch.txt)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-18 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.12-r1386996.patch.txt

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-16 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2206:
-

Status: Open  (was: Patch Available)

Please explain what is preventing us from enabling this feature by default, 
e.g. in which cases is it expected not to work, and what are the failure 
scenarios? 

Based on the current test coverage (not much) I can't tell if it's actually 
possible to use this feature in its current state.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-15 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.11-r1385084.patch.txt

new patch for trunk (revision 1385084). Disabled the optimizer by default and 
updated test results.

[~he yongqiang]:
can you help me test a few cases which the trunk on my machine cannot pass?
Those are TestHBaseMinimrCliDriver, TestHBaseCliDriver, 
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, 
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in 
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, 
testSynchronized in TestSetUGIOnOnlyServer, and 
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver. 
Thanks!  

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-15 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: In Progress  (was: Patch Available)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-15 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: Patch Available  (was: In Progress)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-14 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.10-r1384442.patch.txt

The patch is ported to the latest trunk (revision 1384442). I tested this patch
with an enabled CorrelationOptimizer (hive.optimize.correlation=true). During
the testing, I fixed several bugs and all tests should be ok except those I
explained below.

In case TestParse, there are 42 queries failed. Since I made several minor
changes in SemanticAnalyzer. Seems those results should be updated.

In TestCliDriver, auto_join26.q is failed since it is optimized by the
optimizer. Considering I will make the optimizer disabled by default, I will
not do any change regarding this query and its result.

In TestCliDriver, create_view.q and udaf_percentile_approx.q are two weird
queries. If hive.map.aggr=false, the original trunk will also fail. Seems bug
is involved in the trunk. I have sent an email to dev mailing list regarding
create_view.q. For udaf_percentile_approx.q, I have got time to look at it in
detail.

In TestCliDriver, join31.q is failed. For this case, the query should be
updated to have set hive.optimize.correlation=true. But, since the optimizer
is disabled by default, I will not update this query.

Also, I got some queries which trunk cannot pass. These are
cascade_dbdrop_hadoop20.q, hbase_binary_external_table_queries.q,
hbase_binary_map_queries.q, hbase_binary_storage_queries.q, hbase_joins.q,
hbase_ppd_key_range.q, hbase_pushdown.q, hbase_queries.q,
local_mapred_error_cache.q, and TestCase TestHBaseMinimrCliDriver.

I will run all tests again and will fix any bug related to the patch.

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.1.patch.txt,
HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt,
HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt,
HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

reference:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-01-29 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.8-r1237253.patch.txt

@Kevin,
I wrongly assumed that all output names of the ReduceSinkOperator has a 
structure of KEY/VALUE.internalName. I have solved this issue.

However, the current optimizer cannot handel the case that a table is directly 
connect to a post computation operator (in this case, table b directly connects 
to the operator join). I am planning to solve this issue after this patch. To 
walkaround, you can use ...
SET hive.optimize.reducededuplication=false;
SET hive.optimize.correlation=true;
SELECT * FROM (SELECT * FROM src DISTRIBUTE BY key SORT BY key) a JOIN (SELECT 
* FROM src DISTRIBUTE BY key SORT BY key) b ON a.key = b.key;. 
This query will be optimized and be executed in a single MapReduce job. 

Also, I have updated the patch and it is compatible with revision 1237253.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 YSmartPatchForHive.patch, testQueries.2.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-12-29 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.8.r1224646.patch.txt

diff in the review board has also been updated 
(https://reviews.apache.org/r/2001/). 

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, YSmartPatchForHive.patch, testQueries.1.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-12-29 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: testQueries.2.q

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, YSmartPatchForHive.patch, testQueries.2.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-12-29 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: (was: Queries)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, YSmartPatchForHive.patch, testQueries.1.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-12-29 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: testQueries.2.q

testQueries.2.q have three testing queries

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, YSmartPatchForHive.patch, testQueries.2.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-12-29 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: (was: testQueries.2.q)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, YSmartPatchForHive.patch, testQueries.2.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-12-29 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: (was: testQueries.q)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, YSmartPatchForHive.patch, testQueries.1.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-12-04 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: Patch Available  (was: In Progress)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, Queries, 
 YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-11-23 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.5-1.patch.txt

This is a working-in-progress patch. There are two issues to be addressed 
before next version of patch. Firstly, I will look at if I can remove the 
operator FakeReduceSinkOperator. Secondly, I will look at if correlation 
optimizer can be a one-phase optimizer instead of two-phase one. The current 
implementation will call the correlation optimizer twice (at the beginning and 
the end of optimization, respectively). 

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, Queries, YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-11-23 Thread Yin Huai (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Status: In Progress  (was: Patch Available)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, Queries, YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-21 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.5.patch.txt

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5.patch.txt, Queries, 
 YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-21 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Release Note: This optimizer exploits the intra-query correlations and 
merge multiple correlated MapReduce jobs into one jobs.  (was: This optimizer 
exploits the intra-query correlations and merge multiple correlated MapReduce 
jobs into one jobs. The patch is generated based on hive-trunk with revision 
1171917. )
  Status: Patch Available  (was: Open)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5.patch.txt, Queries, 
 YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-19 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.3.patch.txt

I used ant clean package test tar -logfile ant.log to test all cases again 
and all unknown errors are gone... There are four failures left (groupby1, 
groupby2, groupby3 and groupby5). These four failures are caused by changes I 
made in the method genGroupByPlanReduceSinkOperator in the class 
SemanticAnalyzer. So, current results should be updated. But I am not sure 
how to change the correct results. Does overwrite current results with the new 
results work? Or, is there anything I should do?

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, Queries, YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-19 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Release Note: This optimizer exploits the intra-query correlations and 
merge multiple correlated MapReduce jobs into one jobs. The patch is generated 
based on hive-trunk with revision 1171917. 
  Status: Patch Available  (was: In Progress)

In unit tests, there are four failures in TestParse (groupby1, groupby2, 
groupby3 and groupby5). These four failures are caused by changes I made in the 
method genGroupByPlanReduceSinkOperator in the class SemanticAnalyzer. 
Current results should be updated. But I am not sure how to change the correct 
results. Need some suggestions. Thanks.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, Queries, YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-19 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.4.patch.txt

All of unit test cases are passed, except TestParse_groupby1, 
TestParse_groupby2, TestParse_groupby3 and TestParse_groupby5. Because I made 
some changes in method genGroupByPlanReduceSinkOperator of class 
SemanticAnalyzer, so results of these four cases should be updated. However, 
I found that when these four cases are tested individually, the results differ 
from the results when these four cases are tested with all other cases (when I 
used ant clean package test tar -logfile ant.log). Hive-trunk I checked out 
from svn also has this issue. Is it a bug or did I miss anything?

This patch does not contain updates on the results of cases TestParse_groupby1, 
TestParse_groupby2, TestParse_groupby3 and TestParse_groupby5. 

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, Queries, 
 YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-18 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.2.patch.txt

there are some failures in TestCliDriver. Some failures seems that the output 
misses some lines when using keyword explain. Other failures are related to 
queries with index, e.g. index_quth.q. When I tested query index_quth.q 
manually, there were two errors. One was java.lang.ClassNotFoundException: 
org.apache.derby.jdbc.EmbeddedDriver and another one was 
java.lang.NoClassDefFoundError: javaewah/EWAHCompressedBitmap. 

These errors seems irrelevance to changes I made, but there should be some 
thing wrong... 

@Yongqiang: Can you have a look at my patch and give me some suggestions on how 
to fix it? Thanks

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, Queries, 
 YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-17 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: testQueries.q
HIVE-2206.1.patch.txt

The patch is ready. Queries used in testing are included in the file of 
testQueries.q. 

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.1.patch.txt, Queries, 
 YSmartPatchForHive.patch, testQueries.q


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-06-08 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2206:
---

Attachment: YSmartPatchForHive.patch

a draft patch

will submit revised version later.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
 Attachments: YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-06-08 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: Queries

Two queries (TPC-H Q17 and TPC-H Q18) can be used for testing this optimizer. 
Q17 is the same with the query provided in 
https://issues.apache.org/jira/browse/HIVE-600, but to expose the correlation, 
Q18 is modified. With this optimizer, Q17 and Q18 needs 2 and 4 MapReduce jobs, 
respectively. Without this optimizer, these two queries need 4 and 8 MapReduce 
jobs, respectively.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
 Attachments: Queries, YSmartPatchForHive.patch


 reference:
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

79 matches

Mail list logo