[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090773#comment-13090773 ]
jirapos...@reviews.apache.org commented on HBASE-4241: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 04:49:36.340305) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes ------- Added linebreak. Added comment that deletes need to be retained. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241. https://issues.apache.org/jira/browse/HBASE-4241 Diffs (updated) ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars > Optimize flushing of the Store cache for max versions and (new) min versions > ---------------------------------------------------------------------------- > > Key: HBASE-4241 > URL: https://issues.apache.org/jira/browse/HBASE-4241 > Project: HBase > Issue Type: Improvement > Components: regionserver > Affects Versions: 0.92.0 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Attachments: 4241-v2.txt, 4241.txt > > > As discussed with with Jon, there is room for improvement in how the memstore > is flushed to disk. > Currently only expired KVs are pruned before flushing, but we can also prune > versions if we find at least maxVersions versions in the memstore. > The same holds for the new minversion feature: If we find at least minVersion > versions in the store we can remove all further versions that are expired. > Generally we should use the same mechanism here that is used for Compaction. > I.e. StoreScanner. We only need to add a scanner to Memstore that can scan > along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira