[
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722773#action_12722773
]
Prasad Chakka commented on HIVE-417:
------------------------------------
Schubert,
We can run another map-reduce job that scans the index and builds out the
results file sorted by the index key. This file can be read sequentially and
determine which input table HDFS blocks to be fed to the actual job for the
query.
Another way is to build a sparse index on the index. But if the table itself is
sorted, we can build the sparse index (ala MapFile) directly and use it.
@Facebook, the usecase we have doesn't have this sorting property but I can
envision this being useful for primary indexes where the index sort order and
the table sort order are same.
Can you think of any other ways? Ofcourse, we can process index files using
HBase or TokyoCabinet but that requires another system to be setup and
administered and both systems need to be available for index processing. But in
some cases these solutions also work. The indexing scheme described above
should play well with Hbase and TokyoCabinet since index is a file with rows
containg a key and position parameters. In Hadoop we can stored that in
SequenceFile or may be TFile but if they have to be stored in external systems,
we can plug-in a custom SerDe and change the default location of these two a
location where the external systems can access these files.
> Implement Indexing in Hive
> --------------------------
>
> Key: HIVE-417
> URL: https://issues.apache.org/jira/browse/HIVE-417
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Metastore, Query Processor
> Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
> Reporter: Prasad Chakka
> Assignee: Yongqiang He
> Attachments: hive-417.proto.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.