[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205654#comment-13205654
 ] 

jirapos...@reviews.apache.org commented on HIVE-2206:
-----------------------------------------------------



bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > I've started reviewing this, here's my comments so far.  I'll continue 
to look over it.

I will update this patch soon. 


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 453
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71297#file71297line453>
bq.  >
bq.  >     Does this have to default to false, does anything break if it's true?
bq.  >     
bq.  >     Similarly, have you tried running the tests with this set to true?

I have not tried running the tests with this set to true. I will do it when I 
find a revision which can pass all unit tests (btw, any suggestion on which 
revision should I use?). In my opinion, since this optimizer is kind of 
complicated and it is still being developed, it will be safer to default it to 
false and let users to decide when to use it than default it to true.


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
 line 101
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71299#file71299line101>
bq.  >
bq.  >     It's not clear to me why we need both setRowNumber and processOp.

Since a CorrelationCompositeOperator may have multiple parents, I used a buffer 
to store the output of parents of the CorrelationCompositeOperator (shown 
processOp method). The TableScanOperator will trigger the setRowNumber method 
and then CorrelationCompositeOperator will decide the operationPathTags of this 
row based on the contents in the buffer and then forward the row in its buffer 
to its child. So, setRowNumber in here is used to evaluate the 
operationPathTags of the row in the buffer before the 
CorrelationCompositeOperator gets the new row. 


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
 lines 150-177
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71299#file71299line150>
bq.  >
bq.  >     Putting this code in a helper method would be better than having it 
both here and in setRowNumber.

I will do it. 


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
 line 274
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71300#file71300line274>
bq.  >
bq.  >     Does this commented out code need to be kept?

This commented out code is not needed. I will delete it. 


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java, line 1337
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71303#file71303line1337>
bq.  >
bq.  >     I couldn't find a CorrelationFakeReduceSinkOperator class.

CorrelationLocalSimulativeReduceSinkOperator was named as 
CorrelationFakeReduceSinkOperator. I will use the right name in the comment. 


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java, line 
273
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71305#file71305line273>
bq.  >
bq.  >     Tabs are bad, could you change them to spaces, at least in the new 
code your introducing.

I will change the format of my code. Thanks for letting me know.


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > 
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java,
 line 239
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71308#file71308line239>
bq.  >
bq.  >     I take it from this line it's a requirement that in order for this 
correlation optimization to be attempted every reduce sink has to be followed 
only by children with a single child.
bq.  >     
bq.  >     Could this be relaxed?  Could the optimization simply not be applied 
if there is an operator between two ReduceSinks that has more than one child?
bq.  >     
bq.  >     Also, if there is a ReduceSink which is not followed by another 
ReduceSink, but is followed by an operator with more than one child, this 
prevents the optimization from being used, even though it shouldn't have an 
effect.
bq.  >     
bq.  >     Also, regarding checking if the size <=1, if the size <1 the next 
line will throw an exception.

Only "assert op.getChildOperators().size() > 0;" is needed at here. Thank you 
for letting me know. 


bq.  On 2012-02-10 17:38:09, Kevin Wilfong wrote:
bq.  > 
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java,
 line 335
bq.  > <https://reviews.apache.org/r/2001/diff/4/?file=71308#file71308line335>
bq.  >
bq.  >     findNextChildReduceSinkOperator can return null, do you need to 
check for this?

findNextChildReduceSinkOperator will not return null since its input will not 
be the last ReduceSinkOperator before the FileSinkOperator. For example, 
suppose that we have a plan tree like (some operators)->RS1->(some 
operators)->RS2->(some operators)->FS. The input of 
findNextChildReduceSinkOperator will not be RS2. I will add an assertion and a 
comment after this line. 


- Yin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/#review4912
-----------------------------------------------------------


On 2012-01-29 17:56:48, Yin Huai wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2001/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-29 17:56:48)
bq.  
bq.  
bq.  Review request for hive.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This optimizer exploits intra-query correlations and merges multiple 
correlated MapReduce jobs into one jobs.
bq.  
bq.  
bq.  This addresses bug HIVE-2206.
bq.      https://issues.apache.org/jira/browse/HIVE-2206
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1237326 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
 PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1237326 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
 PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
 PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 
1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 
1237326 
bq.    trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 
1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1237326 
bq.  
bq.  Diff: https://reviews.apache.org/r/2001/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Yin
bq.  
bq.


                
> add a new optimizer for query correlation discovery and optimization
> --------------------------------------------------------------------
>
>                 Key: HIVE-2206
>                 URL: https://issues.apache.org/jira/browse/HIVE-2206
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: Yin Huai
>         Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
> HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
> HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> YSmartPatchForHive.patch, testQueries.2.q
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to