[jira] Commented: (HIVE-417) Implement Indexing in Hive

Joydeep Sen Sarma (JIRA) Mon, 21 Sep 2009 21:18:42 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758116#action_12758116
 ]


Joydeep Sen Sarma commented on HIVE-417:
----------------------------------------

are there any references on this technique?

someone had earlier suggested this (apparently from reading Netezza 
documentation) - but i don't understand when it would work. why would a (fairly 
large) sequencefile block only limited range of values (assuming the metadata 
stores a min-max range). most cases i can imagine in our dataset would either 
have low cardinality columns (so most values would be present) or for large 
cardinality ones - the distribution would be random (relative to the primary 
sort key) - and the range would seem ineffective.

unless there are columns that are closely related to the how data is 
sorted/partitioned (perhaps some product ids are limited to specific range of 
time - but the partitioning is on time and not product id - and even that 
sounds dubious).

a bloom filter would seem much more plausible at allowing good filtering. even 
then don't understand why this sort of metadata should be kept along with the 
block and not separately (much more flexible - can be added on demand) as this 
jira is headed towards.

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

Reply via email to