Andrew Kyle Purtell created HBASE-24099:
-------------------------------------------

             Summary: Use a fair ReentrantReadWriteLock for the region lock 
used to guard closes
                 Key: HBASE-24099
                 URL: https://issues.apache.org/jira/browse/HBASE-24099
             Project: HBase
          Issue Type: Improvement
            Reporter: Andrew Kyle Purtell


Consider creating the region's ReentrantReadWriteLock with the fair locking 
policy. We have had a couple of production incidents where a regionserver 
stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). 
The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that 
time was spent waiting to acquire the write lock on the region in order to 
finish closing it.

{quote}
...

Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region 
XXXX. in 927ms, sequenceid=6091133815, compaction requested=false at 
1585175635349 (+60 ms)

Disabling writes for close at 1585178100629 (+2465280 ms)

{quote}

This time was spent in between the memstore flush and the task status change 
"Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6:

{code}
1480:   // block waiting for the lock for closing

1481:  lock.writeLock().lock(); // FindBugs: Complains 
UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine
{code}
 
The close lock is operating in unfair mode. The table in question is under 
constant high query load. When the close request was received, there were 
active readers. After the close request there were more active readers, 
near-continuous contention. Although the clients would receive 
RegionServerStoppingException and other error notifications, because the region 
could not be reassigned, they kept coming, region (re-)location would find the 
region still hosted on the stuck server. Finally the closing thread waiting for 
the write lock became no longer starved (by chance) after 40 minutes.

The ReentrantReadWriteLock javadoc is clear about the possibility of starvation 
when continuously contended: "_When constructed as non-fair (the default), the 
order of entry to the read and write lock is unspecified, subject to reentrancy 
constraints. A nonfair lock that is continuously contended may indefinitely 
postpone one or more reader or writer threads, but will normally have higher 
throughput than a fair lock._"

We could try changing the acquisition semantics of this lock to fair. This is a 
one line change, where we call the RW lock constructor. Then:

 "_When constructed as fair, threads contend for entry using an approximately 
arrival-order policy. When the currently held lock is released, either the 
longest-waiting single writer thread will be assigned the write lock, or if 
there is a group of reader threads waiting longer than all waiting writer 
threads, that group will be assigned the read lock._" 

This could be better. The close process will have to wait until all readers and 
writers already waiting for acquisition either acquire and release or go away 
but won't be starved by future/incoming requests.

There could be a throughput loss in request handling, though, because this is 
the global reentrant RW lock for the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to