[jira] [Created] (HBASE-23181) Blocked WAL archive: "LogRoller: Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is not online on us"

Michael Stack (Jira) Wed, 16 Oct 2019 20:57:03 -0700

Michael Stack created HBASE-23181:
-------------------------------------

             Summary: Blocked WAL archive: "LogRoller: Failed to schedule flush 
of 8ee433ad59526778c53cc85ed3762d0b, because it is not online on us"
                 Key: HBASE-23181
                 URL: https://issues.apache.org/jira/browse/HBASE-23181
             Project: HBase
          Issue Type: Bug
            Reporter: Michael Stack



On a heavily loaded cluster, WAL count keeps rising and we can get into a state 
where we are not rolling the logs off fast enough. In particular, there is this 
interesting state at the extreme where we pick a region to flush because 'Too 
many WALs' but the region is actually not online. As the WAL count rises, we 
keep picking a region-to-flush that is no longer on the server. This condition 
blocks our being able to clear WALs; eventually WALs climb into the hundreds 
and the RS goes zombie with a full Call queue that starts throwing 
CallQueueTooLargeExceptions (bad if this servers is the one carrying 
hbase:meta).

Here is how it looks in the log:

{code}
# Here is region closing....
2019-10-16 23:10:55,897 INFO 
org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler: Closed 
8ee433ad59526778c53cc85ed3762d0b

....

# Then soon after ...
2019-10-16 23:11:44,041 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is not 
online on us
2019-10-16 23:11:45,006 INFO 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Too many WALs; 
count=45, max=32; forcing flush of 1 regions(s): 
8ee433ad59526778c53cc85ed3762d0b

...
# Later...

2019-10-16 23:20:25,427 INFO 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Too many WALs; 
count=542, max=32; forcing flush of 1 regions(s): 
8ee433ad59526778c53cc85ed3762d0b
2019-10-16 23:20:25,427 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is not 
online on us
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-23181) Blocked WAL archive: "LogRoller: Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is not online on us"

Reply via email to