[ https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949704#comment-13949704 ]
Hudson commented on HBASE-10829: -------------------------------- SUCCESS: Integrated in HBase-0.98 #253 (See [https://builds.apache.org/job/HBase-0.98/253/]) HBASE-10829 Flush is skipped after log replay if the last recovered edits file is skipped (enis: rev 1581954) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java > Flush is skipped after log replay if the last recovered edits file is skipped > ----------------------------------------------------------------------------- > > Key: HBASE-10829 > URL: https://issues.apache.org/jira/browse/HBASE-10829 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Priority: Critical > Fix For: 0.98.1, 0.99.0, 0.96.3 > > Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, > hbase-10829_v3.patch > > > We caught this in an extended test run where IntegrationTestBigLinkedList > failed with some missing keys. > The problem is that HRegion.replayRecoveredEdits() would return -1 if all the > edits in the log file is skipped, which is true for example if the log file > only contains a single compaction record (HBASE-2231) or somehow the edits > cannot be applied (column family deleted, etc). > The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last > returned seqId to decide whether a flush is necessary or not before opening > the region, and discarding replayed recovered edits files. > Therefore, if the last recovered edits file is skipped but some edits from > earlier recovered edits files are applied, the mandatory flush before opening > the region is skipped. If the region server dies after this point before a > flush, the edits are lost. > This is important to fix, though the sequence of events are super rare for a > production cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)