[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4532:
------------------------------

    Description: 
The previous jira, HBASE-4469, is to avoid the top row seek operation if 
row-col bloom filter is enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.


Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can only get 10% more seek savings if the ROWCOL bloom filter is 
enabled.[HBASE-4469]

================================================

After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can get 10% more seek savings for ALL kinds of bloom filter.

  was:
HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.



    
> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-4532
>                 URL: https://issues.apache.org/jira/browse/HBASE-4532
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D27.1.patch, D27.1.patch
>
>
> The previous jira, HBASE-4469, is to avoid the top row seek operation if 
> row-col bloom filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a 
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty 
> column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last 
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are 
> trying to GET/SCAN a row with empty column.
> Evaluation from TestSeekOptimization:
> Previously:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can only get 10% more seek savings if the ROWCOL bloom filter is 
> enabled.[HBASE-4469]
> ================================================
> After this change:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to