[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713003#action_12713003
 ] 

He Yongqiang commented on HIVE-417:
-----------------------------------

Checked how Mysql does with index and found mysql either can not use index to 
handle situations in my earlier post:
{quote}
but, we can not use it for queries like:
4) select * from table1 where col2>34 and col3<3
5) select * from table1 where col2 =34
6) select * from table1 where col3 <45 
{quote}

And now a basic idea for our index design, just like Prasad commented in 
previous post:
1) index structure
use a mr job to create index, input is a file with all columns, and mapper 
output kv pairs, where key is <indexed col1, indexed col2,...> offset.
And we define a comparator for <indexed col1, indexed col2,...> to letting the 
shuffle phase sort all mappers' output. And in reducer, we combine kv-pairs to 
<indexed col1, indexed col2,...> list_of_offsets
This is a dense sorted index, then we create a sparse index on the dense index. 
And we also collect column data distribution informations (histogram) while 
doing this.
2)
we consider using index for a query only when the query involves the columns of 
leftmost part of the index. 
And also need to consider index merge when involves two indexes, and a cost 
estimation to consider whether using index will decrease query time (this is 
the work need to do in the optimizer).

But as first step, we can first finish part 1 and hive ql part. Then consider 
part two(optimizer part). After part1 finished, i will examine part2 in more 
detail.

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to