[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987768#action_12987768 ]
ryan rawson commented on HBASE-3481: ------------------------------------ that is a good point I opened HBASE-3486 to do that. We already have durability unit tests, they should be fixed to fail w/o this patch. > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > ------------------------------------------------------------------------------------------------- > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Assignee: ryan rawson > Priority: Blocker > Fix For: 0.90.1, 0.92.0 > > Attachments: HBASE-3481.txt > > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to simply > obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} > 1. HLog.append(); > 1.1 obtainSeqNum() > 1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 if there are 5 more outstanding puts. Say the writeToWAL() is still in > progress for these puts. In this case, none of these edits (11..15) would > have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what > obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > ---- -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.