[ https://issues.apache.org/jira/browse/HBASE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611913#comment-13611913 ]
Liang Xie commented on HBASE-4443: ---------------------------------- Could we close this jira now ? In HBASE-7845, the TS of index key was replaced with LATEST TIMESTAMP during invoking getShortMidpointKey(), so we could avoid seeking to previous block now:) > optimize/avoid seeking to "previous" block when key you are interested in is > the first one of a block > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-4443 > URL: https://issues.apache.org/jira/browse/HBASE-4443 > Project: HBase > Issue Type: Improvement > Reporter: Kannan Muthukkaruppan > Assignee: Adela Maznikar > > This issue primarily affects cases when you are storing large blobs, i.e. > when blocks contain small number of keys, and the chances of the key you are > looking for being the first block of a key is higher. > Say, you are looking for "row/col", and "row/col/ts=5" is the latest version > of the key in the HFile and is at the beginning of block X. > The search for the key is done by looking for "row/col/TS=Long.MAX_VAL", but > this will land us in block X-1 (because ts=Long.MAX_VAL sorts ahead of ts=5); > only to find that there is no matching "row/col" in block X-1, and then we'll > advance to block X to return the value. > Seems like we should be able to optimize this somehow. > Some possibilities: > 1) Suppose we track that the file contains no deletes, and if the CF setting > has MAX_VERSIONS=1, we can know for sure that block X - 1 does not contain > any relevant data, and directly position the seek to block X. [This will also > require the memstore flusher to remove extra versions if MAX_VERSION=1 and > not allow the file to contain duplicate entries for the same ROW/COL.] > Tracking deletes will also avoid in many cases, the seek to the top of the > row to look for DeleteFamily. > 2) Have a dense index (1 entry per KV in the index; this might be ok for > large object case since index vs. data ratio will still be low). > 3) Have the index contain the last KV of each block also in addition to the > first KV. This doubles the size of the index though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira