[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722773#action_12722773
 ] 

Prasad Chakka commented on HIVE-417:
------------------------------------

Schubert,

We can run another map-reduce job that scans the index and builds out the 
results file sorted by the index key. This file can be read sequentially and 
determine which input table HDFS blocks to be fed to the actual job for the 
query.

Another way is to build a sparse index on the index. But if the table itself is 
sorted, we can build the sparse index (ala MapFile) directly and use it. 
@Facebook, the usecase we have doesn't have this sorting property but I can 
envision this being useful for primary indexes where the index sort order and 
the table sort order are same.

Can you think of any other ways? Ofcourse, we can process index files using 
HBase or TokyoCabinet but that requires another system to be setup and 
administered and both systems need to be available for index processing. But in 
some cases these solutions also work. The indexing scheme described above 
should play well with Hbase and TokyoCabinet since index is a file with rows 
containg a key and position parameters. In Hadoop we can stored that in 
SequenceFile or may be TFile but if they have to be stored in external systems, 
we can plug-in a custom SerDe and change the default location of these two a 
location where the external systems can access these files.

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: Yongqiang He
>         Attachments: hive-417.proto.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to