James Taylor created PHOENIX-29:
-----------------------------------
Summary: Add custom filter to more efficiently navigate KeyValues
in row
Key: PHOENIX-29
URL: https://issues.apache.org/jira/browse/PHOENIX-29
Project: Phoenix
Issue Type: Bug
Reporter: James Taylor
Currently HBase is 50% faster at selecting the first KV in a row than in
selecting any other column. The reason is that when you project a column into a
Scan, HBase uses its ExplicitColumTracker which does a reseek to the column.
The only case where this is not necessary is when the column is the first one.
In most cases (unless you have thousands of versions), it'd be more efficient
to just do a NEXT instead of a reseek (especially if your KV is the next one).
We can provide our own custom filter through which we pass two lists:
1) all KVs referenced in the select expressions. These are the only ones that
need to be returned back to the client which is another advantage we'd get
writing this custom filter.
2) all KVs referenced in the WHERE clause.
The filter could sort the KVs using the standard KeyValue.COMPARATOR and merge
between them and the incoming KVs, using NEXT instead of a reseek. We could
potentially use a reseek if the number of columns in the table is beyond a
certain threshold.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)