James Taylor created PHOENIX-29:
-----------------------------------

             Summary: Add custom filter to more efficiently navigate KeyValues 
in row
                 Key: PHOENIX-29
                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
             Project: Phoenix
          Issue Type: Bug
            Reporter: James Taylor


Currently HBase is 50% faster at selecting the first KV in a row than in 
selecting any other column. The reason is that when you project a column into a 
Scan, HBase uses its ExplicitColumTracker which does a reseek to the column. 
The only case where this is not necessary is when the column is the first one.

In most cases (unless you have thousands of versions), it'd be more efficient 
to just do a NEXT instead of a reseek (especially if your KV is the next one). 
We can provide our own custom filter through which we pass two lists:
1) all KVs referenced in the select expressions. These are the only ones that 
need to be returned back to the client which is another advantage we'd get 
writing this custom filter.
2) all KVs referenced in the WHERE clause.
The filter could sort the KVs using the standard KeyValue.COMPARATOR and merge 
between them and the incoming KVs, using NEXT instead of a reseek. We could 
potentially use a reseek if the number of columns in the table is beyond a 
certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to