[
https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902344#comment-13902344
]
Lars Hofhansl commented on PHOENIX-29:
--------------------------------------
ExplicitColumnTracker will be more efficient with many versions per column or
very wide rows, selecting 1 column out a row with a 10000 columns only need two
seeks in the worst case.
With the ExplicitColumnTracker the cost is proportional to the number of
selected column (the first column in a row is "free" since no extra seek is
needed to get to it), with the WildcardColumnTracker the cost is proportional
to the existing columns/versions.
The cost per column is much higher in ExplicitColumnTracker compared
WildcardColumnTracker (a reseek vs. a next)
Should profile exactly where HBase is spending the time (believe it or not,
this was much, much worse before I fixed HBASE-9915, see the numbers there).
Also see HBASE-4433 as to why some of the re-seeking stuff was added.
> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
> Key: PHOENIX-29
> URL: https://issues.apache.org/jira/browse/PHOENIX-29
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Attachments: PHOENIX-29.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in
> selecting any other column. The reason is that when you project a column into
> a Scan, HBase uses its ExplicitColumTracker which does a reseek to the
> column. The only case where this is not necessary is when the column is the
> first one.
> In most cases (unless you have thousands of versions), it'd be more efficient
> to just do a NEXT instead of a reseek (especially if your KV is the next
> one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that
> need to be returned back to the client which is another advantage we'd get
> writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and
> merge between them and the incoming KVs, using NEXT instead of a reseek. We
> could potentially use a reseek if the number of columns in the table is
> beyond a certain threshold.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)