[
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758127#action_12758127
]
Prasad Chakka commented on HIVE-417:
------------------------------------
i don't think it makes much sense unless there is some clustering or sorting
property. if there is clustering and sorting and the selectivity of a query is
much higher than 10% then storing this metadata along with data makes sense
instead of a separate block. the 10% threshold may be larger for Hive but the
point still stands. in OLAP case data is change seldom and the size of this
kind of metadata is much smaller than the data itself so the overhead of
storing this data is negligible.
something similar to this is done in DB2 Multi-Dimensional Clustering where
whole blocks (disk blocks) are skipped if the key value doesn't fit the query.
> Implement Indexing in Hive
> --------------------------
>
> Key: HIVE-417
> URL: https://issues.apache.org/jira/browse/HIVE-417
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Metastore, Query Processor
> Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
> Reporter: Prasad Chakka
> Assignee: He Yongqiang
> Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.