[jira] Commented: (HIVE-417) Implement Indexing in Hive

Prasad Chakka (JIRA) Sun, 17 May 2009 08:42:09 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710204#action_12710204
 ]


Prasad Chakka commented on HIVE-417:
------------------------------------

1)
The question you raised applies only to B+Tree indexes. The index that I 
defined above is not really a traditional database index but a kind of summary 
table (or view) and any lookup/range-query on table requires reading of the 
whole index. So you can apply all predicates as long as columns referenced in 
the predicates exist in the index. So we should be able use index on (col1, 
col2, col3) for all the queries above. Sorting order has no impact here since 
the whole index is read into memory anyways.

Since this index can be created in sorted order, we can create sparse index 
(similar to non-leaf nodes of a B+-Tree) if the index itself is too big (ie, 
index sizes are order of magnitude larger than HDFS block size). But this can 
be done as a later optimization.  

2)
With the design above, indexes on joins will come free since predicate pushdown 
will push the 'user.name="user_name"' to above the join and only index filtered 
rows participate in join.

But creating indexes on the joined output may increase the index size so as to 
decrease the overall effectiveness. But with sparse indexes this problem might 
be mitigated so we can support this kind of join indexes along with support for 
sparse indexes.

3)
Yes, for some aggregation queries it may make sense to read the index (since it 
is a summary table as well). Aggregations or any queries that involve only 
columns from the index can operate only on the index and not the main table.

4) 
I also looked at it and not sure how it fits into Hive. Katta is more like an 
distributed index server.

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

Reply via email to