[ 
https://issues.apache.org/jira/browse/HBASE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273450#comment-13273450
 ] 

Mikhail Bautin commented on HBASE-4443:
---------------------------------------

Liyin and Kannan suggested that we could store the last key of the block 
instead of the first key of this block, thus avoiding doubling the index size. 
That will solve the problem of seeking to the first key in block by using the 
KV of the form (row, column, MAX_TIMESTAMP). However, it would require bumping 
the HFile format version and maintaining separate read paths for the two 
formats.

In our further discussion we found a way to keep the current file format and 
solve the problem of seeking to the first key in block. We can move start keys 
of every block as far to the left as possible on the KV axis. More precisely, 
the start key of i'th block will now be the previous block's start key + 
epsilon, where the "+ epsilon" operation involves decreasing the timestamp. In 
case such artificial start key ends up greater than the block's real start key 
(e.g. when the difference between last key of the previous block and the first 
key of the current block is only in KV type or memstore timestamp) we will 
still use the real first key of the block. This does not require changing the 
meaning of the key we store in the index per block (it can still be logically 
considered the first key), but when we are searching for (row1, col2, 
MAX_TIMESTAMP), we would not have to go the previous block ending with (row1, 
col1, t1), because (row1, col2, MAX_TIMESTAMP) > (row1, col1, t1 - 1), which we 
will use as the logical start key of our current block containing the key of 
e.g. (row1, col2, t2). Hopefully this makes some sense. I will attach a figure 
to illustrate this shortly.

                
> optimize/avoid seeking to "previous" block when key you are interested in is 
> the first one of a block
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4443
>                 URL: https://issues.apache.org/jira/browse/HBASE-4443
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>
> This issue primarily affects cases when you are storing large blobs, i.e. 
> when blocks contain small number of keys, and the chances of the key you are 
> looking for being the first block of a key is higher.
> Say, you are looking for "row/col", and "row/col/ts=5" is the latest version 
> of the key in the HFile and is at the beginning of block X.
> The search for the key is done by looking for "row/col/TS=Long.MAX_VAL", but 
> this will land us in block X-1 (because ts=Long.MAX_VAL sorts ahead of ts=5); 
> only to find that there is no matching "row/col" in block X-1, and then we'll 
> advance to block X to return the value.
> Seems like we should be able to optimize this somehow.
> Some possibilities:
> 1) Suppose we track that the  file contains no deletes, and if the CF setting 
> has MAX_VERSIONS=1, we can know for sure that block X - 1 does not contain 
> any relevant data, and directly position the seek to block X. [This will also 
> require the memstore flusher to remove extra versions if MAX_VERSION=1 and 
> not allow the file to contain duplicate entries for the same ROW/COL.]  
> Tracking deletes will also avoid in many cases, the seek to the top of the 
> row to look for DeleteFamily.
> 2) Have a dense index (1 entry per KV in the index; this might be ok for 
> large object case since index vs. data ratio will still be low).
> 3) Have the index contain the last KV of each block also in addition to the 
> first KV. This doubles the size of the index though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to