Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-09-10 Thread Prajakta Kalmegh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

(Updated 2011-09-10 21:10:06.178279)


Review request for hive and John Sichi.


Changes
---

Added order-by to queries for test determinism.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be 
  data/files/lineitem.txt PRE-CREATION 
  data/files/tbl.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
5053576 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7a00c00 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dcdfb9e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-09-08 Thread Prajakta Kalmegh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

(Updated 2011-09-09 01:14:16.218940)


Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs (updated)
-

  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dcdfb9e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be 
  data/files/lineitem.txt PRE-CREATION 
  data/files/tbl.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
5053576 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7a00c00 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-09-07 Thread Prajakta Kalmegh


 On 2011-08-05 21:20:21, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
  line 172
  https://reviews.apache.org/r/1194/diff/2/?file=30443#file30443line172
 
  See recent changes in corresponding CompactIndexHandler code for 
  HIVEOPTINDEXFILTER; need the same here (or better, factor out common code 
  here and elsewhere).
  
  On a related note, you may be able to use the same technique instead of 
  isQueryInsertToTable; this would be preferable since it's nice to be able 
  to use the index rewrite in cases where it's a normal INSERT table with 
  index being used for GROUP BY on SELECT from some other table.
 

I have factored out the common code in all Index handler classes and placed it 
in IndexUtils file. 

I also removed the code for isQueryInsertToTable and am setting the 
HIVEOPTINDEXFILTER to false instead. 


 On 2011-08-05 21:20:21, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java,
   line 153
  https://reviews.apache.org/r/1194/diff/2/?file=30449#file30449line153
 
  Shouldn't this be the same as COUNT(*)?
 

Yes it is. I missed to change this part from the previous code.


- Prajakta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1303
---


On 2011-08-03 10:31:42, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1194/
 ---
 
 (Updated 2011-08-03 10:31:42)
 
 
 Review request for hive and John Sichi.
 
 
 Summary
 ---
 
 This patch has defined a new AggregateIndexHandler which is used to optimize 
 the query plan for groupby queries. 
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
  8295687 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
  699519b 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/1194/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta
 




Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-08-05 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1303
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2955

Can't you just look up AGGREGATES in the map?



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2953

Add a helper method to avoid duplicating the code in the else block below.




ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2954

Can't you just look up AGGREGATES in the map?



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2956

See recent changes in corresponding CompactIndexHandler code for 
HIVEOPTINDEXFILTER; need the same here (or better, factor out common code here 
and elsewhere).

On a related note, you may be able to use the same technique instead of 
isQueryInsertToTable; this would be preferable since it's nice to be able to 
use the index rewrite in cases where it's a normal INSERT table with index 
being used for GROUP BY on SELECT from some other table.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java
https://reviews.apache.org/r/1194/#comment2957

@params here don't match actual params



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/1194/#comment2958

Shouldn't this be the same as COUNT(*)?




ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2980

Besides EXPLAIN, you should include a few queries against a non-empty table 
verifying that you get the correct results both with and without the 
optimization applied.  Remember to include an ORDER BY for test determinism.




ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2978

Isn't this set redundant?


- John


On 2011-08-03 10:31:42, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1194/
 ---
 
 (Updated 2011-08-03 10:31:42)
 
 
 Review request for hive and John Sichi.
 
 
 Summary
 ---
 
 This patch has defined a new AggregateIndexHandler which is used to optimize 
 the query plan for groupby queries. 
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
  8295687 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
  699519b 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/1194/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta
 




Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-08-01 Thread Prajakta Kalmegh


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
  line 61
  https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61
 
  Please run ant checkstyle and fix all the formatting discrepancies it 
  reports for your new files.
 

Done! The code is still having checkstyle formatting errors only for places 
where we have used LinkedHashMap, HashMap and ArrayList. The error states 
Declaring variables, return values or parameters of type 'HashMap' is not 
allowed.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
  line 184
  https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line184
 
  Don't you need to reuse the compact implementation here so that the 
  index can be used for WHERE (not just GROUP BY)?
 

The AggregateIndexHandler now extends from CompactIndexHandler instead of 
TableBasedIndexHandler. We override only analyzeIndexDefinition(...) and 
getIndexBuilderMapRedTask(...) methods from CompactIndexHandler.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 700
  https://reviews.apache.org/r/1194/diff/1/?file=27054#file27054line700
 
  This method is redundant now.

Removed. Sorry to have missed that.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java,
   line 252
  https://reviews.apache.org/r/1194/diff/1/?file=27056#file27056line252
 
  I can't think of a case where it would be worse.

Ok.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java,
   line 164
  https://reviews.apache.org/r/1194/diff/1/?file=27057#file27057line164
 
  Actually group-by is now preserved in all cases.

Forgot to change a few comments after fixing the bug. Done!


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java,
   line 66
  https://reviews.apache.org/r/1194/diff/1/?file=27058#file27058line66
 
  Please use HTML bullet syntax for Javadoc (otherwise it all gets run 
  together into one line when rendered).
 

Done!


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java,
   line 93
  https://reviews.apache.org/r/1194/diff/1/?file=27060#file27060line93
 
  Shouldn't this be BIGINT?
  
  Also, I think you're supposed to use a TypeInfoFactory for this purpose.

Yes. Changed it to bigint. Also changed it in AggregateIndexHandler where I had 
declared the type to be int.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 
  603
  https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603
 
  Not sure why this new constructor is needed...after using it, all you 
  do is get the table out of it.

The only other constructor option for tableSpec needs the ASTNode as one of its 
parameters. Since we need to construct a new tableSpec using only the index 
table name, and we do not have a ASTNode for this, I need this constructor. If 
you have any other way in mind, please let me know. That would be helpful.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 27
  https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line27
 
  This should *not* be using the index, since the index is built on 
  count(l_shipdate), and l_shipdate may contain nulls, whereas the query is 
  referencing count(1), which is insensitive to nulls.

Yes true. :(
I have now changed the count(1) queries with count(l_shipdate) in 
ql_rewrite_gbtoidx.q file. Also, verified that the count(1) queries are not 
using the index.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 61
  https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line61
 
  Need additional tests to verify all the cases where the optimization 
  should *not* be used:
  
  * when configuration disables it
  * when index partitions do not cover table partitions (I still don't 
  see the code for this case)
  * ... all the other conditions checked for in the code ...
 

Added new queries to verify that optimization is not used in case of:
* when configuration disables it
* ... all the other conditions checked for in the code ...

About * when index partitions do not cover table partitions , still pending 
(also the code for it). I will upload the new patch once this is done.


- Prajakta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-08-01 Thread John Sichi


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
  line 61
  https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61
 
  Please run ant checkstyle and fix all the formatting discrepancies it 
  reports for your new files.
 
 
 Prajakta Kalmegh wrote:
 Done! The code is still having checkstyle formatting errors only for 
 places where we have used LinkedHashMap, HashMap and ArrayList. The error 
 states Declaring variables, return values or parameters of type 'HashMap' is 
 not allowed.

Best practice is to only use interfaces (Map/List) except at the point of 
instantiation where you select a concrete class.  Hive violates this in a 
number of places, and sometimes that forces you to violate it in new code too; 
but otherwise, please follow this one.


 On 2011-07-28 21:40:30, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 
  603
  https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603
 
  Not sure why this new constructor is needed...after using it, all you 
  do is get the table out of it.
 
 Prajakta Kalmegh wrote:
 The only other constructor option for tableSpec needs the ASTNode as one 
 of its parameters. Since we need to construct a new tableSpec using only the 
 index table name, and we do not have a ASTNode for this, I need this 
 constructor. If you have any other way in mind, please let me know. That 
 would be helpful.

I'm asking why you even need to construct a new tableSpec instance.  All you do 
with it is reference ts.tableHandle.  And to create that tableHandle, you can 
just do db.getTable(tableName).  So I don't see the purpose of the tableSpec 
instance.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
---


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1194/
 ---
 
 (Updated 2011-07-26 14:44:01)
 
 
 Review request for hive and John Sichi.
 
 
 Summary
 ---
 
 This patch has defined a new AggregateIndexHandler which is used to optimize 
 the query plan for groupby queries. 
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 77a6dc6 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/1194/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta
 




Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-07-28 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2711

Please run ant checkstyle and fix all the formatting discrepancies it 
reports for your new files.




ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2695

Don't you need to reuse the compact implementation here so that the index 
can be used for WHERE (not just GROUP BY)?




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/1194/#comment2696

This method is redundant now.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/1194/#comment2698

I can't think of a case where it would be worse.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/1194/#comment2699

Actually group-by is now preserved in all cases.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
https://reviews.apache.org/r/1194/#comment2700

Please use HTML bullet syntax for Javadoc (otherwise it all gets run 
together into one line when rendered).




ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
https://reviews.apache.org/r/1194/#comment2701

indentation



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2703

Shouldn't this be BIGINT?

Also, I think you're supposed to use a TypeInfoFactory for this purpose.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2702

indentation



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2704

typo:  Repace



ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
https://reviews.apache.org/r/1194/#comment2707

Not sure why this new constructor is needed...after using it, all you do is 
get the table out of it.



ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2709

This should *not* be using the index, since the index is built on 
count(l_shipdate), and l_shipdate may contain nulls, whereas the query is 
referencing count(1), which is insensitive to nulls.



ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2710

Need additional tests to verify all the cases where the optimization should 
*not* be used:

* when configuration disables it
* when index partitions do not cover table partitions (I still don't see 
the code for this case)
* ... all the other conditions checked for in the code ...



- John


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1194/
 ---
 
 (Updated 2011-07-26 14:44:01)
 
 
 Review request for hive and John Sichi.
 
 
 Summary
 ---
 
 This patch has defined a new AggregateIndexHandler which is used to optimize 
 the query plan for groupby queries. 
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 77a6dc6 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-07-26 Thread Prajakta Kalmegh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-03-17 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/505/#review339
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/505/#comment684

Suggestion is to make this configurable (via IDXPROPERTIES) to save space 
when column is known NOT NULL.  (Also later to allow for specification of other 
aggregates.)



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment686

Indentation is messed up here.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment685

Please eliminate all TODO's, and don't use printStackTrace.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment687

Instead of downcasting over and over, you should probably be doing it just 
once in the calling method (and asserting that you got the right type since 
otherwise generateOperatorTree is not going to have the desired effect.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment688

Hive naming convention for variables is camelCase, not under_score.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment690

I see query_has_distinct being written but never read.  Why do we need it?  
I don't think we want to be relying on the parse tree at all.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment689

Don't swallow exceptions like this.


- John


On 2011-03-13 20:00:28, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/505/
 ---
 
 (Updated 2011-03-13 20:00:28)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 New Review starting from patch 3.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 46739b7 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
   ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyProcFactory.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 04f560f 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/505/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta
 




Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-03-14 Thread Prajakta Kalmegh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/505/
---

Review request for hive.


Summary
---

New Review starting from patch 3.


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 46739b7 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
  ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyProcFactory.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 04f560f 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/505/diff


Testing
---


Thanks,

Prajakta



Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-28 Thread John Sichi


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
   line 19
  https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line19
 
  For HIVE-1644, the HMC team is creating a new package 
  org.apache.hadoop.hive.ql.optimizer.index.  I think any optimizer code 
  related to indexing should go in there.
 
 
 Prajakta Kalmegh wrote:
 I understand we just need to place our code in the 
 'org.apache.hadoop.hive.ql.optimizer.index' package. Right?

Correct.  But for generic optimizer code (e.g. RewriteParseContextGenerator, 
which is unrelated to indexing) can stay where it is.


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
   line 51
  https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line51
 
  This code is currently tied to the compact index representation.
  
  We mentioned earlier that we'll need a new index representation 
  (summary) instead in order to implement the counts correctly (we should 
  leave the compact representation as is).
  
  So:
  
  * until the summary representation is added, we can't enable this
  
  * in general, it would be good to find a way to make this pluggable; 
  for example, the bitmap index representation can also be utilized by 
  counting the bits, but the rewrite expression would be slightly different
 
 
 Prajakta Kalmegh wrote:
 We have a couple of designs ready for the new index handler. I will post 
 them on JIRA. 
 
 When you say pluggable does it mean that we should be able to change 
 the rewrite expression depending on which index handler is used? For example, 
 we will get different rewrite expressions if we use either the new index 
 handler or the bitmap index handler. Our optimization code should work with 
 any rewrite expression. Is this understanding correct?

Correct.  If you look at what's going on in the latest HIVE-1644 patch, they 
are extending the HiveIndexHandler interface with an optional method for 
dealing with a WHERE clause.  You can do the same for aggregation.  It may take 
a few iterations to get it generic enough, but it helps that we already have 
the bitmap work in progress so that's two cases to generalize from.


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java,
   line 291
  https://reviews.apache.org/r/392/diff/1/?file=10597#file10597line291
 
  Need to abstract out this dependency on _c convention.
 
 Prajakta Kalmegh wrote:
 Can you suggest a way using which we can get rid of this dependency?

Actually, isn't just checking internalToAlias.get != null already good enough?  
If so we can just eliminate the _c check.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/#review217
---


On 2011-02-03 16:45:15, John Sichi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/392/
 ---
 
 (Updated 2011-02-03 16:45:15)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Preliminary review.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryProcFactory.java
  PRE-CREATION 
   
 

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-27 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/#review217
---



http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/392/#comment432

I suggest hive.optimize.index.groupby for this property.




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/392/#comment433

Typo:  Unknown



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
https://reviews.apache.org/r/392/#comment435

I don't understand why these state variables are maintained as conf 
variables rather than just data members of a class.  Could you explain why that 
is necessary?




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
https://reviews.apache.org/r/392/#comment434

Why is the exception being ignored here?



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment437

Need Apache headers on all new files.




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment436

Use expr instead of engfd



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment440

Typo:  groub-by




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment506

What about nested function invocations?



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment442

Eliminate commented-out code



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment443

Need to abstract out this dependency on _c convention.



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment507

For HIVE-1644, the HMC team is creating a new package 
org.apache.hadoop.hive.ql.optimizer.index.  I think any optimizer code related 
to indexing should go in there.




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment444

This code is currently tied to the compact index representation.

We mentioned earlier that we'll need a new index representation (summary) 
instead in order to implement the counts correctly (we should leave the compact 
representation as is).

So:

* until the summary representation is added, we can't enable this

* in general, it would be good to find a way to make this pluggable; for 
example, the bitmap index representation can also be utilized by counting the 
bits, but the rewrite expression would be slightly different




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment445

TODO?



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment446

Might want to use a finer level than LOG.info



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment447

Just a sanity check to avoid huge payloads coming back from thrift.



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment448

Hmm, no, I think we should fail hard here.  If the underlying problem is 
fatal (e.g. the metastore went down), we should not try to hide it.


- John


On 2011-02-03 16:45:15, John Sichi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/392/
 ---
 
 (Updated 2011-02-03 16:45:15)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Preliminary review.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 
   

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-27 Thread Prajakta Kalmegh


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java,
   line 121
  https://reviews.apache.org/r/392/diff/1/?file=10596#file10596line121
 
  I don't understand why these state variables are maintained as conf 
  variables rather than just data members of a class.  Could you explain why 
  that is necessary?
 

The intent was to have these variables available as part of the current 
configuration. I think that was unnecessary. We will change it to data members 
of RewriteCanApplyCtx class.


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java,
   line 187
  https://reviews.apache.org/r/392/diff/1/?file=10597#file10597line187
 
  What about nested function invocations?

Yes we missed that. We will upload a new patch with this change.


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java,
   line 291
  https://reviews.apache.org/r/392/diff/1/?file=10597#file10597line291
 
  Need to abstract out this dependency on _c convention.

Can you suggest a way using which we can get rid of this dependency?


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
   line 19
  https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line19
 
  For HIVE-1644, the HMC team is creating a new package 
  org.apache.hadoop.hive.ql.optimizer.index.  I think any optimizer code 
  related to indexing should go in there.
 

I understand we just need to place our code in the 
'org.apache.hadoop.hive.ql.optimizer.index' package. Right?


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
   line 51
  https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line51
 
  This code is currently tied to the compact index representation.
  
  We mentioned earlier that we'll need a new index representation 
  (summary) instead in order to implement the counts correctly (we should 
  leave the compact representation as is).
  
  So:
  
  * until the summary representation is added, we can't enable this
  
  * in general, it would be good to find a way to make this pluggable; 
  for example, the bitmap index representation can also be utilized by 
  counting the bits, but the rewrite expression would be slightly different
 

We have a couple of designs ready for the new index handler. I will post them 
on JIRA. 

When you say pluggable does it mean that we should be able to change the 
rewrite expression depending on which index handler is used? For example, we 
will get different rewrite expressions if we use either the new index handler 
or the bitmap index handler. Our optimization code should work with any rewrite 
expression. Is this understanding correct?


- Prajakta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/#review217
---


On 2011-02-03 16:45:15, John Sichi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/392/
 ---
 
 (Updated 2011-02-03 16:45:15)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Preliminary review.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java
  

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-03 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/
---

Review request for hive.


Summary
---

Preliminary review.


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/fatal.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out
 PRE-CREATION 

Diff: https://reviews.apache.org/r/392/diff


Testing
---


Thanks,

John