[jira] Commented: (HIVE-417) Implement Indexing in Hive

Seymour Zhang (JIRA) Fri, 29 May 2009 08:53:15 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714492#action_12714492
 ]


Seymour Zhang commented on HIVE-417:
------------------------------------

Hello Prasad and Yongqiang, Thank you very much for this great effort. 

One of my suggestions would be that, since we've done indexing with Mapreduce, 
and for some queries based on the generated indexes, can we just omit the 
time-consuming Mapreduce phase during the querying period, as we've already got 
all of the files/offsets and we can go to these specific file offsets directly 
to get relevant rows of the table? This would greatly expedite the query 
process.

This would be helpful for the following case in one of my usages with Hive. 
With Hive, I've already sharded (by date), and bucketed (by cols hashing) of my 
log data into a hierachical files. Also I've sorted each file with the hashing 
cols. As I may have many rows with same column values but different timestamps, 
to minimize index size, I'd like to treat these rows of same col values as a 
block and only use a single index entry for this block. This will grealy reduce 
the index size of my data, but still very useful in my query request with those 
cols.

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

Reply via email to