[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liyin Tang updated HBASE-4532: ------------------------------ Attachment: hbase-4532-89-fb.patch > Avoid top row seek by dedicated bloom filter for delete family bloom filter > --------------------------------------------------------------------------- > > Key: HBASE-4532 > URL: https://issues.apache.org/jira/browse/HBASE-4532 > Project: HBase > Issue Type: Improvement > Reporter: Liyin Tang > Assignee: Liyin Tang > Attachments: D27.1.patch, D27.1.patch, hbase-4532-89-fb.patch > > > The previous jira, HBASE-4469, is to avoid the top row seek operation if > row-col bloom filter is enabled. > This jira tries to avoid top row seek for all the cases by creating a > dedicated bloom filter only for delete family > The only subtle use case is when we are interested in the top row with empty > column. > For example, > we are interested in row1/cf1:/1/put. > So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family > bloom filter will say there is NO delete family. > Then it will avoid the top row seek and return a fake kv, which is the last > kv for this row (createLastOnRowCol). > In this way, we have already missed the real kv we are interested in. > The solution for the above problem is to disable this optimization if we are > trying to GET/SCAN a row with empty column. > Evaluation from TestSeekOptimization: > Previously: > For bloom=NONE, compr=NONE total seeks without optimization: 2506, with > optimization: 1714 (68.40%), savings: 31.60% > For bloom=ROW, compr=NONE total seeks without optimization: 2506, with > optimization: 1714 (68.40%), savings: 31.60% > For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > For bloom=NONE, compr=GZ total seeks without optimization: 2506, with > optimization: 1714 (68.40%), savings: 31.60% > For bloom=ROW, compr=GZ total seeks without optimization: 2506, with > optimization: 1714 (68.40%), savings: 31.60% > For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is > enabled.[HBASE-4469] > ================================================ > After this change: > For bloom=NONE, compr=NONE total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > For bloom=ROW, compr=NONE total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > For bloom=NONE, compr=GZ total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > For bloom=ROW, compr=GZ total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with > optimization: 1458 (58.18%), savings: 41.82% > So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira