subject:"Review Request\: HIVE\-1694\: Accelerate GROUP BY execution using indexes"

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-09-10 Thread Prajakta Kalmegh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

(Updated 2011-09-10 21:10:06.178279)


Review request for hive and John Sichi.


Changes
---

Added order-by to queries for test determinism.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be 
  data/files/lineitem.txt PRE-CREATION 
  data/files/tbl.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
5053576 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7a00c00 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dcdfb9e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-09-08 Thread Prajakta Kalmegh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

(Updated 2011-09-09 01:14:16.218940)


Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs (updated)
-

  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dcdfb9e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be 
  data/files/lineitem.txt PRE-CREATION 
  data/files/tbl.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
5053576 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7a00c00 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-09-07 Thread Prajakta Kalmegh



 On 2011-08-05 21:20:21, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
  line 172
  https://reviews.apache.org/r/1194/diff/2/?file=30443#file30443line172
 
  See recent changes in corresponding CompactIndexHandler code for 
  HIVEOPTINDEXFILTER; need the same here (or better, factor out common code 
  here and elsewhere).
  
  On a related note, you may be able to use the same technique instead of 
  isQueryInsertToTable; this would be preferable since it's nice to be able 
  to use the index rewrite in cases where it's a normal INSERT table with 
  index being used for GROUP BY on SELECT from some other table.
 

I have factored out the common code in all Index handler classes and placed it 
in IndexUtils file. 

I also removed the code for isQueryInsertToTable and am setting the 
HIVEOPTINDEXFILTER to false instead. 


 On 2011-08-05 21:20:21, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java,
   line 153
  https://reviews.apache.org/r/1194/diff/2/?file=30449#file30449line153
 
  Shouldn't this be the same as COUNT(*)?
 

Yes it is. I missed to change this part from the previous code.


- Prajakta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1303
---


On 2011-08-03 10:31:42, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1194/
 ---
 
 (Updated 2011-08-03 10:31:42)
 
 
 Review request for hive and John Sichi.
 
 
 Summary
 ---
 
 This patch has defined a new AggregateIndexHandler which is used to optimize 
 the query plan for groupby queries. 
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
  8295687 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
  699519b 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/1194/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-08-05 Thread John Sichi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1303
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2955

Can't you just look up AGGREGATES in the map?



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2953

Add a helper method to avoid duplicating the code in the else block below.




ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2954

Can't you just look up AGGREGATES in the map?



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2956

See recent changes in corresponding CompactIndexHandler code for 
HIVEOPTINDEXFILTER; need the same here (or better, factor out common code here 
and elsewhere).

On a related note, you may be able to use the same technique instead of 
isQueryInsertToTable; this would be preferable since it's nice to be able to 
use the index rewrite in cases where it's a normal INSERT table with index 
being used for GROUP BY on SELECT from some other table.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java
https://reviews.apache.org/r/1194/#comment2957

@params here don't match actual params



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/1194/#comment2958

Shouldn't this be the same as COUNT(*)?




ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2980

Besides EXPLAIN, you should include a few queries against a non-empty table 
verifying that you get the correct results both with and without the 
optimization applied.  Remember to include an ORDER BY for test determinism.




ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2978

Isn't this set redundant?


- John


On 2011-08-03 10:31:42, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1194/
 ---
 
 (Updated 2011-08-03 10:31:42)
 
 
 Review request for hive and John Sichi.
 
 
 Summary
 ---
 
 This patch has defined a new AggregateIndexHandler which is used to optimize 
 the query plan for groupby queries. 
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
  8295687 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
  699519b 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/1194/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-08-01 Thread Prajakta Kalmegh

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java,
line 61
https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61

Please run ant checkstyle and fix all the formatting discrepancies it
reports for your new files.

Done! The code is still having checkstyle formatting errors only for places
where we have used LinkedHashMap, HashMap and ArrayList. The error states
Declaring variables, return values or parameters of type 'HashMap' is not
allowed.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java,
line 184
https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line184

Don't you need to reuse the compact implementation here so that the
index can be used for WHERE (not just GROUP BY)?

The AggregateIndexHandler now extends from CompactIndexHandler instead of
TableBasedIndexHandler. We override only analyzeIndexDefinition(...) and
getIndexBuilderMapRedTask(...) methods from CompactIndexHandler.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 700
https://reviews.apache.org/r/1194/diff/1/?file=27054#file27054line700

This method is redundant now.

Removed. Sorry to have missed that.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java,
line 252
https://reviews.apache.org/r/1194/diff/1/?file=27056#file27056line252

I can't think of a case where it would be worse.

Ok.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java,
line 164
https://reviews.apache.org/r/1194/diff/1/?file=27057#file27057line164

Actually group-by is now preserved in all cases.

Forgot to change a few comments after fixing the bug. Done!

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java,
line 66
https://reviews.apache.org/r/1194/diff/1/?file=27058#file27058line66

Please use HTML bullet syntax for Javadoc (otherwise it all gets run
together into one line when rendered).

Done!

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java,
line 93
https://reviews.apache.org/r/1194/diff/1/?file=27060#file27060line93

Shouldn't this be BIGINT?

Also, I think you're supposed to use a TypeInfoFactory for this purpose.

Yes. Changed it to bigint. Also changed it in AggregateIndexHandler where I had
declared the type to be int.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line
603
https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603

Not sure why this new constructor is needed...after using it, all you
do is get the table out of it.

The only other constructor option for tableSpec needs the ASTNode as one of its
parameters. Since we need to construct a new tableSpec using only the index
table name, and we do not have a ASTNode for this, I need this constructor. If
you have any other way in mind, please let me know. That would be helpful.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 27
https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line27

This should *not* be using the index, since the index is built on
count(l_shipdate), and l_shipdate may contain nulls, whereas the query is
referencing count(1), which is insensitive to nulls.

Yes true. :(
I have now changed the count(1) queries with count(l_shipdate) in
ql_rewrite_gbtoidx.q file. Also, verified that the count(1) queries are not
using the index.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 61
https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line61

Need additional tests to verify all the cases where the optimization
should *not* be used:

* when configuration disables it
* when index partitions do not cover table partitions (I still don't
see the code for this case)
* ... all the other conditions checked for in the code ...

Added new queries to verify that optimization is not used in case of:
* when configuration disables it
* ... all the other conditions checked for in the code ...

About * when index partitions do not cover table partitions , still pending
(also the code for it). I will upload the new patch once this is done.

- Prajakta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-08-01 Thread John Sichi

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java,
line 61
https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61

Please run ant checkstyle and fix all the formatting discrepancies it
reports for your new files.

Prajakta Kalmegh wrote:
Done! The code is still having checkstyle formatting errors only for
places where we have used LinkedHashMap, HashMap and ArrayList. The error
states Declaring variables, return values or parameters of type 'HashMap' is
not allowed.

Best practice is to only use interfaces (Map/List) except at the point of
instantiation where you select a concrete class. Hive violates this in a
number of places, and sometimes that forces you to violate it in new code too;
but otherwise, please follow this one.

On 2011-07-28 21:40:30, John Sichi wrote:
ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line
603
https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603

Not sure why this new constructor is needed...after using it, all you
do is get the table out of it.

Prajakta Kalmegh wrote:
The only other constructor option for tableSpec needs the ASTNode as one
of its parameters. Since we need to construct a new tableSpec using only the
index table name, and we do not have a ASTNode for this, I need this
constructor. If you have any other way in mind, please let me know. That
would be helpful.

I'm asking why you even need to construct a new tableSpec instance. All you do
with it is reference ts.tableHandle. And to create that tableHandle, you can
just do db.getTable(tableName). So I don't see the purpose of the tableSpec
instance.

- John

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
---

On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

(Updated 2011-07-26 14:44:01)

Review request for hive and John Sichi.

Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize
the query plan for groupby queries.

This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694

Diffs
-

common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f
ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3
ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a

ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
PRE-CREATION

ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
PRE-CREATION

ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
PRE-CREATION

ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
PRE-CREATION

ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
PRE-CREATION

ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
77a6dc6
ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION
ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION

Diff: https://reviews.apache.org/r/1194/diff

Testing
---

Thanks,

Prajakta

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-07-28 Thread John Sichi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2711

Please run ant checkstyle and fix all the formatting discrepancies it 
reports for your new files.




ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2695

Don't you need to reuse the compact implementation here so that the index 
can be used for WHERE (not just GROUP BY)?




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/1194/#comment2696

This method is redundant now.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/1194/#comment2698

I can't think of a case where it would be worse.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/1194/#comment2699

Actually group-by is now preserved in all cases.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
https://reviews.apache.org/r/1194/#comment2700

Please use HTML bullet syntax for Javadoc (otherwise it all gets run 
together into one line when rendered).




ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
https://reviews.apache.org/r/1194/#comment2701

indentation



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2703

Shouldn't this be BIGINT?

Also, I think you're supposed to use a TypeInfoFactory for this purpose.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2702

indentation



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2704

typo:  Repace



ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
https://reviews.apache.org/r/1194/#comment2707

Not sure why this new constructor is needed...after using it, all you do is 
get the table out of it.



ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2709

This should *not* be using the index, since the index is built on 
count(l_shipdate), and l_shipdate may contain nulls, whereas the query is 
referencing count(1), which is insensitive to nulls.



ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2710

Need additional tests to verify all the cases where the optimization should 
*not* be used:

* when configuration disables it
* when index partitions do not cover table partitions (I still don't see 
the code for this case)
* ... all the other conditions checked for in the code ...



- John


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1194/
 ---
 
 (Updated 2011-07-26 14:44:01)
 
 
 Review request for hive and John Sichi.
 
 
 Summary
 ---
 
 This patch has defined a new AggregateIndexHandler which is used to optimize 
 the query plan for groupby queries. 
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 77a6dc6 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-07-26 Thread Prajakta Kalmegh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-03-17 Thread John Sichi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/505/#review339
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/505/#comment684

Suggestion is to make this configurable (via IDXPROPERTIES) to save space 
when column is known NOT NULL.  (Also later to allow for specification of other 
aggregates.)



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment686

Indentation is messed up here.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment685

Please eliminate all TODO's, and don't use printStackTrace.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment687

Instead of downcasting over and over, you should probably be doing it just 
once in the calling method (and asserting that you got the right type since 
otherwise generateOperatorTree is not going to have the desired effect.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment688

Hive naming convention for variables is camelCase, not under_score.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment690

I see query_has_distinct being written but never read.  Why do we need it?  
I don't think we want to be relying on the parse tree at all.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment689

Don't swallow exceptions like this.


- John


On 2011-03-13 20:00:28, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/505/
 ---
 
 (Updated 2011-03-13 20:00:28)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 New Review starting from patch 3.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 46739b7 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
   ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyProcFactory.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 04f560f 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/505/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-03-14 Thread Prajakta Kalmegh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/505/
---

Review request for hive.


Summary
---

New Review starting from patch 3.


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 46739b7 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
  ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyProcFactory.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 04f560f 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/505/diff


Testing
---


Thanks,

Prajakta

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-28 Thread John Sichi

On 2011-02-27 15:49:28, John Sichi wrote:
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
line 19
https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line19

For HIVE-1644, the HMC team is creating a new package
org.apache.hadoop.hive.ql.optimizer.index. I think any optimizer code
related to indexing should go in there.

Prajakta Kalmegh wrote:
I understand we just need to place our code in the
'org.apache.hadoop.hive.ql.optimizer.index' package. Right?

Correct. But for generic optimizer code (e.g. RewriteParseContextGenerator,
which is unrelated to indexing) can stay where it is.

On 2011-02-27 15:49:28, John Sichi wrote:
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
line 51
https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line51

This code is currently tied to the compact index representation.

We mentioned earlier that we'll need a new index representation
(summary) instead in order to implement the counts correctly (we should
leave the compact representation as is).

So:

* until the summary representation is added, we can't enable this

* in general, it would be good to find a way to make this pluggable;
for example, the bitmap index representation can also be utilized by
counting the bits, but the rewrite expression would be slightly different

Prajakta Kalmegh wrote:
We have a couple of designs ready for the new index handler. I will post
them on JIRA.

When you say pluggable does it mean that we should be able to change
the rewrite expression depending on which index handler is used? For example,
we will get different rewrite expressions if we use either the new index
handler or the bitmap index handler. Our optimization code should work with
any rewrite expression. Is this understanding correct?

Correct. If you look at what's going on in the latest HIVE-1644 patch, they
are extending the HiveIndexHandler interface with an optional method for
dealing with a WHERE clause. You can do the same for aggregation. It may take
a few iterations to get it generic enough, but it helps that we already have
the bitmap work in progress so that's two cases to generalize from.

On 2011-02-27 15:49:28, John Sichi wrote:
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java,
line 291
https://reviews.apache.org/r/392/diff/1/?file=10597#file10597line291

Need to abstract out this dependency on _c convention.

Prajakta Kalmegh wrote:
Can you suggest a way using which we can get rid of this dependency?

Actually, isn't just checking internalToAlias.get != null already good enough?
If so we can just eliminate the _c check.

- John

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/#review217
---

On 2011-02-03 16:45:15, John Sichi wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/
---

(Updated 2011-02-03 16:45:15)

Review request for hive.

Summary
---

Preliminary review.

This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694

Diffs
-

http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048

http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
1067048

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
1067048

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
1067048

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
1067048

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
PRE-CREATION

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
PRE-CREATION

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
PRE-CREATION

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java
PRE-CREATION

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryProcFactory.java
PRE-CREATION

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-27 Thread John Sichi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/#review217
---



http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/392/#comment432

I suggest hive.optimize.index.groupby for this property.




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/392/#comment433

Typo:  Unknown



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
https://reviews.apache.org/r/392/#comment435

I don't understand why these state variables are maintained as conf 
variables rather than just data members of a class.  Could you explain why that 
is necessary?




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
https://reviews.apache.org/r/392/#comment434

Why is the exception being ignored here?



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment437

Need Apache headers on all new files.




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment436

Use expr instead of engfd



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment440

Typo:  groub-by




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment506

What about nested function invocations?



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment442

Eliminate commented-out code



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/392/#comment443

Need to abstract out this dependency on _c convention.



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment507

For HIVE-1644, the HMC team is creating a new package 
org.apache.hadoop.hive.ql.optimizer.index.  I think any optimizer code related 
to indexing should go in there.




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment444

This code is currently tied to the compact index representation.

We mentioned earlier that we'll need a new index representation (summary) 
instead in order to implement the counts correctly (we should leave the compact 
representation as is).

So:

* until the summary representation is added, we can't enable this

* in general, it would be good to find a way to make this pluggable; for 
example, the bitmap index representation can also be utilized by counting the 
bits, but the rewrite expression would be slightly different




http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment445

TODO?



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment446

Might want to use a finer level than LOG.info



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment447

Just a sanity check to avoid huge payloads coming back from thrift.



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
https://reviews.apache.org/r/392/#comment448

Hmm, no, I think we should fail hard here.  If the underlying problem is 
fatal (e.g. the metastore went down), we should not try to hide it.


- John


On 2011-02-03 16:45:15, John Sichi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/392/
 ---
 
 (Updated 2011-02-03 16:45:15)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Preliminary review.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-27 Thread Prajakta Kalmegh



 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java,
   line 121
  https://reviews.apache.org/r/392/diff/1/?file=10596#file10596line121
 
  I don't understand why these state variables are maintained as conf 
  variables rather than just data members of a class.  Could you explain why 
  that is necessary?
 

The intent was to have these variables available as part of the current 
configuration. I think that was unnecessary. We will change it to data members 
of RewriteCanApplyCtx class.


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java,
   line 187
  https://reviews.apache.org/r/392/diff/1/?file=10597#file10597line187
 
  What about nested function invocations?

Yes we missed that. We will upload a new patch with this change.


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java,
   line 291
  https://reviews.apache.org/r/392/diff/1/?file=10597#file10597line291
 
  Need to abstract out this dependency on _c convention.

Can you suggest a way using which we can get rid of this dependency?


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
   line 19
  https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line19
 
  For HIVE-1644, the HMC team is creating a new package 
  org.apache.hadoop.hive.ql.optimizer.index.  I think any optimizer code 
  related to indexing should go in there.
 

I understand we just need to place our code in the 
'org.apache.hadoop.hive.ql.optimizer.index' package. Right?


 On 2011-02-27 15:49:28, John Sichi wrote:
  http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java,
   line 51
  https://reviews.apache.org/r/392/diff/1/?file=10598#file10598line51
 
  This code is currently tied to the compact index representation.
  
  We mentioned earlier that we'll need a new index representation 
  (summary) instead in order to implement the counts correctly (we should 
  leave the compact representation as is).
  
  So:
  
  * until the summary representation is added, we can't enable this
  
  * in general, it would be good to find a way to make this pluggable; 
  for example, the bitmap index representation can also be utilized by 
  counting the bits, but the rewrite expression would be slightly different
 

We have a couple of designs ready for the new index handler. I will post them 
on JIRA. 

When you say pluggable does it mean that we should be able to change the 
rewrite expression depending on which index handler is used? For example, we 
will get different rewrite expressions if we use either the new index handler 
or the bitmap index handler. Our optimization code should work with any rewrite 
expression. Is this understanding correct?


- Prajakta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/#review217
---


On 2011-02-03 16:45:15, John Sichi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/392/
 ---
 
 (Updated 2011-02-03 16:45:15)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Preliminary review.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  1067048 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-03 Thread John Sichi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/
---

Review request for hive.


Summary
---

Preliminary review.


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/fatal.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out
 PRE-CREATION 

Diff: https://reviews.apache.org/r/392/diff


Testing
---


Thanks,

John

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

14 matches

Site Navigation

Mail list logo

Footer information