[ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116569#comment-13116569 ]
jirapos...@reviews.apache.org commented on HBASE-2794: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794. https://issues.apache.org/jira/browse/HBASE-2794 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail > ROWCOL bloom filter not used if multiple columns within same family are > requested in a Get > ------------------------------------------------------------------------------------------ > > Key: HBASE-2794 > URL: https://issues.apache.org/jira/browse/HBASE-2794 > Project: HBase > Issue Type: Improvement > Components: performance > Reporter: Kannan Muthukkaruppan > > Noticed the following snippet in StoreFile.java:Scanner:shouldSeek(): > {code} > switch(bloomFilterType) { > case ROW: > key = row; > break; > case ROWCOL: > if (columns.size() == 1) { > byte[] col = columns.first(); > key = Bytes.add(row, col); > break; > } > //$FALL-THROUGH$ > default: > return true; > } > {code} > If columns.size > 1, then we currently don't take advantage of the bloom > filter. We should optimize this to check bloom for each of columns and if > none of the columns are present in the bloom avoid opening the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira