[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710204#action_12710204 ]
Prasad Chakka commented on HIVE-417: ------------------------------------ 1) The question you raised applies only to B+Tree indexes. The index that I defined above is not really a traditional database index but a kind of summary table (or view) and any lookup/range-query on table requires reading of the whole index. So you can apply all predicates as long as columns referenced in the predicates exist in the index. So we should be able use index on (col1, col2, col3) for all the queries above. Sorting order has no impact here since the whole index is read into memory anyways. Since this index can be created in sorted order, we can create sparse index (similar to non-leaf nodes of a B+-Tree) if the index itself is too big (ie, index sizes are order of magnitude larger than HDFS block size). But this can be done as a later optimization. 2) With the design above, indexes on joins will come free since predicate pushdown will push the 'user.name="user_name"' to above the join and only index filtered rows participate in join. But creating indexes on the joined output may increase the index size so as to decrease the overall effectiveness. But with sparse indexes this problem might be mitigated so we can support this kind of join indexes along with support for sparse indexes. 3) Yes, for some aggregation queries it may make sense to read the index (since it is a summary table as well). Aggregations or any queries that involve only columns from the index can operate only on the index and not the main table. 4) I also looked at it and not sure how it fits into Hive. Katta is more like an distributed index server. > Implement Indexing in Hive > -------------------------- > > Key: HIVE-417 > URL: https://issues.apache.org/jira/browse/HIVE-417 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore, Query Processor > Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0 > Reporter: Prasad Chakka > Assignee: He Yongqiang > > Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.