[ 
https://issues.apache.org/jira/browse/PHOENIX-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887513#comment-13887513
 ] 

rajeshbabu commented on PHOENIX-17:
-----------------------------------

bq. Unless the index is highly selective or region local (i.e. data and index 
data are colocated) it is hard to get good performance out of HBase.
Yes Lars. We are able to see good performance with colocation.

bq. The index hits would need to be translated in to a set of GETs, which is 
fairly slow. (or maybe a SKIP SCAN with the GETs as way points)
Yes. Even if we use region level coprocessors to fetch seek points from index 
table then we need to hit all regions of main table I think,which is like 
losing the meaning of global indexing.
Lets take an example. 
1) create employee table with id, name, addr as columns and create index on 
name column. Assume <a,c> are the split keys of the table.
2) Inserted some data
|     ID     |    NAME    |    ADDR    |
+------------+------------+------------+
| a          | foo        | bang       |
| b          | foo        | chen       |
| c          | foo        | mum        |
| d          | foo        | hyd        |
+------------+------------+------------+
3) Then index data will be as follows. Now the foo name is there for four 
employees.
+--------+---------+
| NAME  |    :ID     |
+--------+---------+
| foo        | a          |
| foo        | b          |
| foo        | c          |
| foo        | d          |
+------------+------------+
4) When we scan with condition name='foo' then from first region coprocessor 
hooks we can fetch a, b as seek points similarly c,d from second region hooks.
Like this we should hit all the main table regions.

With this approach, the dead lock issue discussed during put may also come.

bq. Highly selective indexes, obviously, include PK and unique indexes, which 
could be useful even when the query isn't covered.
Most of the cases users prefer pk and unique indexes which need not be fully 
covered.

> Support to make use of partial covered indexes in scan
> ------------------------------------------------------
>
>                 Key: PHOENIX-17
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-17
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: rajeshbabu
>             Fix For: 3.0.0
>
>
> Normally when we want to use secondary indices we create index on one or very 
> few
> columns of interest in query conditions. Index may not contain all the 
> columns to retrieve.
> Currently Phoenix supporting full covered indexes only(where all or most of 
> the columns
> should be in the index in many cases). When we run a query we will choose to 
> scan from user table
> or index table based on condition that  whether all projected columns in the 
> index or not.
> This approach may have some disadvantages mainly in case of wider tables.
> 1)If we did just store all the columns in the index, then it would be just 
> like creating another copy of the entire table
>  – which would take up way too much space and would be very inefficient for 
> wider tables.
> 2) Some times if user creates index on few columns and observes that index is 
> not getting used
> and then he need to add all the projected columns to index(may be to the part 
> of index or included columns). 
> which is something like we are exposing design decisions to the users, 
> especially when we already
> target to simplify user's experience by giving SQL on top of a noSQL DB.
> 3) One more thing is as of now if we have an index table contains all 
> projected columns in the
> query then we are simply scanning index table only.This can also become full 
> table scan when we don't
> have any condition in the query or condition on non primary key column of 
> index.
> Some times this might give bad performance than normal table full scan.
> This JIRA is to support making use of partial covered indexes to avoid full 
> table scan.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to