[
https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072186#comment-13072186
]
ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------
The testcase TestResettingCounters is failing because as per the test case all
the increments operation that we do is not written to wal.
But when we do a cache flush we call wal.startCacheFlush() where we check
'Long seq = this.lastSeqWritten.remove(encodedRegionName)'
is null or not.
If null we throw error and halt the system.
In this testcase whereever we call region.increment
'for (int i=0;i<5;i++) region.increment(odd, null, false);'
we pass false for write to WAL. Hence this problem occurs. So we can correct
this test case by passing true instead of false and i verified the same.
But i think we shouldnot halt the system in this case. We can change this
behaviour
Correct me if my analysis is wrong?
> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
> Key: HBASE-3845
> URL: https://issues.apache.org/jira/browse/HBASE-3845
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.3
> Reporter: Prakash Khemani
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Fix For: 0.90.5
>
> Attachments:
> 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch,
> HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch,
> HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch,
> HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and
> Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family.
> That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a
> log-sequence-id for a region that is not the earliest log-sequence-id for
> that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure
> that we only keep track of the earliest log-sequence-number that is present
> in the memstore.
> Every time the memstore is flushed we remove the region's entry in
> lastSequenceWritten and wait for the next append to populate this entry
> again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under
> HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added
> into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry
> from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten().
> But this will be the log seq id of the current append. All the edits that
> were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I
> will replace it with the log-seq-id of the region-flush-event.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira