Enis Soztutar created HBASE-16721:
-------------------------------------

             Summary: Concurrency issue in WAL unflushed seqId tracking
                 Key: HBASE-16721
                 URL: https://issues.apache.org/jira/browse/HBASE-16721
             Project: HBase
          Issue Type: Bug
            Reporter: Enis Soztutar
            Assignee: Enis Soztutar
             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4


I'm inspecting an interesting case where in a production cluster, some 
regionservers ends up accumulating hundreds of WAL files, even with force 
flushes going on due to max logs. This happened multiple times on the cluster, 
but not on other clusters. The cluster has periodic memstore flusher disabled, 
however, this still does not explain why the force flush of regions due to max 
limit is not working. I think the periodic memstore flusher just masks the 
underlying problem, which is why we do not see this in other clusters. 

The problem starts like this: 
{code}
2016-09-21 17:49:18,272 INFO  [regionserver//10.2.0.55:16020.logRoller] 
wal.FSHLog: Too many wals: logs=33, maxlogs=32; forcing flush of 1 regions(s): 
d4cf39dc40ea79f5da4d0cf66d03cb1f
2016-09-21 17:49:18,273 WARN  [regionserver//10.2.0.55:16020.logRoller] 
regionserver.LogRoller: Failed to schedule flush of 
d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
{code}

then, it continues until the RS is restarted: 
{code}
2016-09-23 17:43:49,356 INFO  [regionserver//10.2.0.55:16020.logRoller] 
wal.FSHLog: Too many wals: logs=721, maxlogs=32; forcing flush of 1 regions(s): 
d4cf39dc40ea79f5da4d0cf66d03cb1f
2016-09-23 17:43:49,357 WARN  [regionserver//10.2.0.55:16020.logRoller] 
regionserver.LogRoller: Failed to schedule flush of 
d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
{code}

The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already 
split some time ago, and was able to flush its data and split without any 
problems. However, the FSHLog still thinks that there is some unflushed data 
for this region. 







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to