[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071252#comment-13071252 ]
Hudson commented on HBASE-3845: ------------------------------- Integrated in HBase-TRUNK #2051 (See [https://builds.apache.org/job/HBase-TRUNK/2051/]) HBASE-3845 data loss because lastSeqWritten can miss memstore edits stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > data loss because lastSeqWritten can miss memstore edits > -------------------------------------------------------- > > Key: HBASE-3845 > URL: https://issues.apache.org/jira/browse/HBASE-3845 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.3 > Reporter: Prakash Khemani > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.90.5 > > Attachments: > 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, > HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, > HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, > HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch > > > (I don't have a test case to prove this yet but I have run it by Dhruba and > Kannan internally and wanted to put this up for some feedback.) > In this discussion let us assume that the region has only one column family. > That way I can use region/memstore interchangeably. > After a memstore flush it is possible for lastSeqWritten to have a > log-sequence-id for a region that is not the earliest log-sequence-id for > that region's memstore. > HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure > that we only keep track of the earliest log-sequence-number that is present > in the memstore. > Every time the memstore is flushed we remove the region's entry in > lastSequenceWritten and wait for the next append to populate this entry > again. This is where the problem happens. > step 1: > flusher.prepare() snapshots the memstore under > HRegion.updatesLock.writeLock(). > step 2 : > as soon as the updatesLock.writeLock() is released new entries will be added > into the memstore. > step 3 : > wal.completeCacheFlush() is called. This method removes the region's entry > from lastSeqWritten. > step 4: > the next append will create a new entry for the region in lastSeqWritten(). > But this will be the log seq id of the current append. All the edits that > were added in step 2 are missing. > == > as a temporary measure, instead of removing the region's entry in step 3 I > will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira