[ https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527972#comment-15527972 ]
Enis Soztutar commented on HBASE-16721: --------------------------------------- In branch-1 and earlier, FSHLog.findRegionsToForceFlush() is the method which finds that region to be force-flushed. This means that the region is somehow still in the oldestUnflushedStoreSequenceIds, although the region is closed already and flushed. I suspect the issue is a race between FSHLog.startCacheFlush() and FSHLog.append() racing for the updates to the oldestUnflushedStoreSequenceIdsOfRegion map. FSHlog.append() is called from the ring buffer handler thread, and does this: {code} private void updateOldestUnflushedSequenceIds(byte[] encodedRegionName, Set<byte[]> familyNameSet, Long lRegionSequenceId) { ConcurrentMap<byte[], Long> oldestUnflushedStoreSequenceIdsOfRegion = getOrCreateOldestUnflushedStoreSequenceIdsOfRegion(encodedRegionName); for (byte[] familyName : familyNameSet) { oldestUnflushedStoreSequenceIdsOfRegion.putIfAbsent(familyName, lRegionSequenceId); } } {code} And FSHLog.startCacheFlush() does this: {code} synchronized (regionSequenceIdLock) { ConcurrentMap<byte[], Long> oldestUnflushedStoreSequenceIdsOfRegion = oldestUnflushedStoreSequenceIds.get(encodedRegionName); if (oldestUnflushedStoreSequenceIdsOfRegion != null) { for (byte[] familyName: flushedFamilyNames) { Long seqId = oldestUnflushedStoreSequenceIdsOfRegion.remove(familyName); if (seqId != null) { oldStoreSeqNum.put(familyName, seqId); } } if (!oldStoreSeqNum.isEmpty()) { Map<byte[], Long> oldValue = this.lowestFlushingStoreSequenceIds.put( encodedRegionName, oldStoreSeqNum); assert oldValue == null: "Flushing map not cleaned up for " + Bytes.toString(encodedRegionName); } if (oldestUnflushedStoreSequenceIdsOfRegion.isEmpty()) { // Remove it otherwise it will be in oldestUnflushedStoreSequenceIds for ever // even if the region is already moved to other server. // Do not worry about data racing, we held write lock of region when calling // startCacheFlush, so no one can add value to the map we removed. oldestUnflushedStoreSequenceIds.remove(encodedRegionName); } else { oldestUnflushedSequenceId = Collections.min(oldestUnflushedStoreSequenceIdsOfRegion.values()); } } {code} The actual problem is that the sub-map for the region which keeps the per-store unflushed edits gets removes from: {code} Long seqId = oldestUnflushedStoreSequenceIdsOfRegion.remove(familyName); {code} but the actual region entry is removed some time later in: {code} if (oldestUnflushedStoreSequenceIdsOfRegion.isEmpty()) { oldestUnflushedStoreSequenceIds.remove(encodedRegionName); {code} However, this comment is not correct: {code} // Do not worry about data racing, we held write lock of region when calling // startCacheFlush, so no one can add value to the map we removed. {code} and it is likely that an append() request is still executing and will re-add the removed family to the map. Although startCacheFlush() happens under the HRegion.updatesLock().writeLock() and doMiniBatchMutation() acquires the read lock, we may give away the read lock before the ring buffer handler actually does the FSHLog.append(). We release the region lock in step 6, but only wait for the seqId assignment in step 8. > Concurrency issue in WAL unflushed seqId tracking > ------------------------------------------------- > > Key: HBASE-16721 > URL: https://issues.apache.org/jira/browse/HBASE-16721 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4 > > > I'm inspecting an interesting case where in a production cluster, some > regionservers ends up accumulating hundreds of WAL files, even with force > flushes going on due to max logs. This happened multiple times on the > cluster, but not on other clusters. The cluster has periodic memstore flusher > disabled, however, this still does not explain why the force flush of regions > due to max limit is not working. I think the periodic memstore flusher just > masks the underlying problem, which is why we do not see this in other > clusters. > The problem starts like this: > {code} > 2016-09-21 17:49:18,272 INFO [regionserver//10.2.0.55:16020.logRoller] > wal.FSHLog: Too many wals: logs=33, maxlogs=32; forcing flush of 1 > regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f > 2016-09-21 17:49:18,273 WARN [regionserver//10.2.0.55:16020.logRoller] > regionserver.LogRoller: Failed to schedule flush of > d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null > {code} > then, it continues until the RS is restarted: > {code} > 2016-09-23 17:43:49,356 INFO [regionserver//10.2.0.55:16020.logRoller] > wal.FSHLog: Too many wals: logs=721, maxlogs=32; forcing flush of 1 > regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f > 2016-09-23 17:43:49,357 WARN [regionserver//10.2.0.55:16020.logRoller] > regionserver.LogRoller: Failed to schedule flush of > d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null > {code} > The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already > split some time ago, and was able to flush its data and split without any > problems. However, the FSHLog still thinks that there is some unflushed data > for this region. -- This message was sent by Atlassian JIRA (v6.3.4#6332)