[ https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909954#comment-13909954 ]
James Taylor commented on PHOENIX-29: ------------------------------------- Good catch, [~anoop.hbase]. Looks good. > Add custom filter to more efficiently navigate KeyValues in row > --------------------------------------------------------------- > > Key: PHOENIX-29 > URL: https://issues.apache.org/jira/browse/PHOENIX-29 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Attachments: PHOENIX-29.patch, PHOENIX-29_V2.patch, > PHOENIX-29_V5.patch, PHOENIX-29_v3.patch, PHOENIX-29_v4.patch > > > Currently HBase is 50% faster at selecting the first KV in a row than in > selecting any other column. The reason is that when you project a column into > a Scan, HBase uses its ExplicitColumTracker which does a reseek to the > column. The only case where this is not necessary is when the column is the > first one. > In most cases (unless you have thousands of versions), it'd be more efficient > to just do a NEXT instead of a reseek (especially if your KV is the next > one). We can provide our own custom filter through which we pass two lists: > 1) all KVs referenced in the select expressions. These are the only ones that > need to be returned back to the client which is another advantage we'd get > writing this custom filter. > 2) all KVs referenced in the WHERE clause. > The filter could sort the KVs using the standard KeyValue.COMPARATOR and > merge between them and the incoming KVs, using NEXT instead of a reseek. We > could potentially use a reseek if the number of columns in the table is > beyond a certain threshold. -- This message was sent by Atlassian JIRA (v6.1.5#6160)