[jira] [Commented] (HBASE-4443) optimize/avoid seeking to "previous" block when key you are interested in is the first one of a block

Liang Xie (JIRA) Sat, 23 Mar 2013 17:59:17 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611913#comment-13611913
 ]


Liang Xie commented on HBASE-4443:
----------------------------------

Could we close this jira now ? In HBASE-7845, the TS of index key was replaced 
with LATEST TIMESTAMP during invoking getShortMidpointKey(), so we could avoid 
seeking to previous block now:)
                
> optimize/avoid seeking to "previous" block when key you are interested in is 
> the first one of a block
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4443
>                 URL: https://issues.apache.org/jira/browse/HBASE-4443
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Adela Maznikar
>
> This issue primarily affects cases when you are storing large blobs, i.e. 
> when blocks contain small number of keys, and the chances of the key you are 
> looking for being the first block of a key is higher.
> Say, you are looking for "row/col", and "row/col/ts=5" is the latest version 
> of the key in the HFile and is at the beginning of block X.
> The search for the key is done by looking for "row/col/TS=Long.MAX_VAL", but 
> this will land us in block X-1 (because ts=Long.MAX_VAL sorts ahead of ts=5); 
> only to find that there is no matching "row/col" in block X-1, and then we'll 
> advance to block X to return the value.
> Seems like we should be able to optimize this somehow.
> Some possibilities:
> 1) Suppose we track that the  file contains no deletes, and if the CF setting 
> has MAX_VERSIONS=1, we can know for sure that block X - 1 does not contain 
> any relevant data, and directly position the seek to block X. [This will also 
> require the memstore flusher to remove extra versions if MAX_VERSION=1 and 
> not allow the file to contain duplicate entries for the same ROW/COL.]  
> Tracking deletes will also avoid in many cases, the seek to the top of the 
> row to look for DeleteFamily.
> 2) Have a dense index (1 entry per KV in the index; this might be ok for 
> large object case since index vs. data ratio will still be low).
> 3) Have the index contain the last KV of each block also in addition to the 
> first KV. This doubles the size of the index though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4443) optimize/avoid seeking to "previous" block when key you are interested in is the first one of a block

Reply via email to