[ 
https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527972#comment-15527972
 ] 

Enis Soztutar commented on HBASE-16721:
---------------------------------------

In branch-1 and earlier, FSHLog.findRegionsToForceFlush() is the method which 
finds that region to be force-flushed. This means that the region is somehow 
still in the oldestUnflushedStoreSequenceIds, although the region is closed 
already and flushed.

I suspect the issue is a race between FSHLog.startCacheFlush() and 
FSHLog.append() racing for the updates to the 
oldestUnflushedStoreSequenceIdsOfRegion map. 

FSHlog.append() is called from the ring buffer handler thread, and does this: 
{code}
    private void updateOldestUnflushedSequenceIds(byte[] encodedRegionName,
        Set<byte[]> familyNameSet, Long lRegionSequenceId) {
      ConcurrentMap<byte[], Long> oldestUnflushedStoreSequenceIdsOfRegion =
          getOrCreateOldestUnflushedStoreSequenceIdsOfRegion(encodedRegionName);
      for (byte[] familyName : familyNameSet) {
        oldestUnflushedStoreSequenceIdsOfRegion.putIfAbsent(familyName, 
lRegionSequenceId);
      }
    }
{code}

And FSHLog.startCacheFlush() does this: 
{code}
synchronized (regionSequenceIdLock) {
      ConcurrentMap<byte[], Long> oldestUnflushedStoreSequenceIdsOfRegion =
          oldestUnflushedStoreSequenceIds.get(encodedRegionName);
      if (oldestUnflushedStoreSequenceIdsOfRegion != null) {
        for (byte[] familyName: flushedFamilyNames) {
          Long seqId = 
oldestUnflushedStoreSequenceIdsOfRegion.remove(familyName);
          if (seqId != null) {
            oldStoreSeqNum.put(familyName, seqId);
          }
        }
        if (!oldStoreSeqNum.isEmpty()) {
          Map<byte[], Long> oldValue = this.lowestFlushingStoreSequenceIds.put(
              encodedRegionName, oldStoreSeqNum);
          assert oldValue == null: "Flushing map not cleaned up for "
              + Bytes.toString(encodedRegionName);
        }
        if (oldestUnflushedStoreSequenceIdsOfRegion.isEmpty()) {
          // Remove it otherwise it will be in oldestUnflushedStoreSequenceIds 
for ever
          // even if the region is already moved to other server.
          // Do not worry about data racing, we held write lock of region when 
calling
          // startCacheFlush, so no one can add value to the map we removed.
          oldestUnflushedStoreSequenceIds.remove(encodedRegionName);
        } else {
          oldestUnflushedSequenceId =
              Collections.min(oldestUnflushedStoreSequenceIdsOfRegion.values());
        }
      }
{code}

The actual problem is that the sub-map for the region which keeps the per-store 
unflushed edits gets removes from:
{code}
          Long seqId = 
oldestUnflushedStoreSequenceIdsOfRegion.remove(familyName);
{code}
but the actual region entry is removed some time later in: 
{code}
        if (oldestUnflushedStoreSequenceIdsOfRegion.isEmpty()) {
          oldestUnflushedStoreSequenceIds.remove(encodedRegionName);
{code}

However, this comment is not correct: 
{code}
          // Do not worry about data racing, we held write lock of region when 
calling
          // startCacheFlush, so no one can add value to the map we removed.
{code}
and it is likely that an append() request is still executing and will re-add 
the removed family to the map. Although startCacheFlush() happens under the 
HRegion.updatesLock().writeLock() and doMiniBatchMutation() acquires the read 
lock, we may give away the read lock before the ring buffer handler actually 
does the FSHLog.append(). We release the region lock in step 6, but only wait 
for the seqId assignment in step 8. 


> Concurrency issue in WAL unflushed seqId tracking
> -------------------------------------------------
>
>                 Key: HBASE-16721
>                 URL: https://issues.apache.org/jira/browse/HBASE-16721
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>
>
> I'm inspecting an interesting case where in a production cluster, some 
> regionservers ends up accumulating hundreds of WAL files, even with force 
> flushes going on due to max logs. This happened multiple times on the 
> cluster, but not on other clusters. The cluster has periodic memstore flusher 
> disabled, however, this still does not explain why the force flush of regions 
> due to max limit is not working. I think the periodic memstore flusher just 
> masks the underlying problem, which is why we do not see this in other 
> clusters. 
> The problem starts like this: 
> {code}
> 2016-09-21 17:49:18,272 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=33, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-21 17:49:18,273 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> then, it continues until the RS is restarted: 
> {code}
> 2016-09-23 17:43:49,356 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=721, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-23 17:43:49,357 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already 
> split some time ago, and was able to flush its data and split without any 
> problems. However, the FSHLog still thinks that there is some unflushed data 
> for this region. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to