[ 
https://issues.apache.org/jira/browse/HBASE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-4443:
-----------------------------------------

    Assignee: Adela Maznikar
    
> optimize/avoid seeking to "previous" block when key you are interested in is 
> the first one of a block
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4443
>                 URL: https://issues.apache.org/jira/browse/HBASE-4443
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Adela Maznikar
>
> This issue primarily affects cases when you are storing large blobs, i.e. 
> when blocks contain small number of keys, and the chances of the key you are 
> looking for being the first block of a key is higher.
> Say, you are looking for "row/col", and "row/col/ts=5" is the latest version 
> of the key in the HFile and is at the beginning of block X.
> The search for the key is done by looking for "row/col/TS=Long.MAX_VAL", but 
> this will land us in block X-1 (because ts=Long.MAX_VAL sorts ahead of ts=5); 
> only to find that there is no matching "row/col" in block X-1, and then we'll 
> advance to block X to return the value.
> Seems like we should be able to optimize this somehow.
> Some possibilities:
> 1) Suppose we track that the  file contains no deletes, and if the CF setting 
> has MAX_VERSIONS=1, we can know for sure that block X - 1 does not contain 
> any relevant data, and directly position the seek to block X. [This will also 
> require the memstore flusher to remove extra versions if MAX_VERSION=1 and 
> not allow the file to contain duplicate entries for the same ROW/COL.]  
> Tracking deletes will also avoid in many cases, the seek to the top of the 
> row to look for DeleteFamily.
> 2) Have a dense index (1 entry per KV in the index; this might be ok for 
> large object case since index vs. data ratio will still be low).
> 3) Have the index contain the last KV of each block also in addition to the 
> first KV. This doubles the size of the index though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to