[
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133002#comment-13133002
]
[email protected] commented on HBASE-4532:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
-----------------------------------------------------------
(Updated 2011-10-21 19:58:05.693922)
Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin,
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
Summary (updated)
-------
The previous jira, HBASE-4469, is to avoid the top row seek operation if
row-col bloom filter is enabled.
This jira tries to avoid top row seek for all the cases by creating a dedicated
bloom filter only for delete family
The only subtle use case is when we are interested in the top row with empty
column.
For example,
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.
The solution for the above problem is to disable this optimization if we are
trying to GET/SCAN a row with empty column.
Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is
enabled.[HBASE-4469]
================================================
After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
So we can get about 10% more seek savings for ALL kinds of bloom filter.
This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532
Diffs
-----
src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
6cf7cce
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
92070b3
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
ebb360c
src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
b8bcc65
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
48e9163
src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8
Diff: https://reviews.apache.org/r/2393/diff
Testing
-------
Passed all the unit tests
Thanks,
Liyin
> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---------------------------------------------------------------------------
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
> Attachments: D27.1.patch, D27.1.patch
>
>
> The previous jira, HBASE-4469, is to avoid the top row seek operation if
> row-col bloom filter is enabled.
> This jira tries to avoid top row seek for all the cases by creating a
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty
> column.
> For example,
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are
> trying to GET/SCAN a row with empty column.
> Evaluation from TestSeekOptimization:
> Previously:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is
> enabled.[HBASE-4469]
> ================================================
> After this change:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings for ALL kinds of bloom filter.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira