Rajeshbabu Chintaguntla created HBASE-27509:
-----------------------------------------------

             Summary: Possible region gets stuck in CLOSING state
                 Key: HBASE-27509
                 URL: https://issues.apache.org/jira/browse/HBASE-27509
             Project: HBase
          Issue Type: Bug
          Components: Region Assignment
    Affects Versions: 2.3.4
            Reporter: Rajeshbabu Chintaguntla
            Assignee: Rajeshbabu Chintaguntla


There is a possible chance of region gets stuck in closing state could be 
because of race between the flush and close or some where the readlock acquired 
on the region is not getting released.
{noformat}
"MemStoreFlusher.1" #236 prio=5 os_prio=0 tid=0x00005639266a4000 nid=0x296e 
waiting on condition [0x00007fdc48a63000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00007fdf42dde850> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
    at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)

"MemStoreFlusher.0" #234 prio=5 os_prio=0 tid=0x00005639266a2800 nid=0x296d 
waiting on condition [0x00007fdc48b64000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00007fdf42dde850> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
    at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
{noformat} 
{noformat}
"RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0" #6337 daemon prio=5 
os_prio=0 tid=0x00007fdc05448800 nid=0x15d1 waiting on condition 
[0x00007fdc1befd000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00007fdf42dde850> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
    at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
    at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1662)
    at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1591)
    - locked <0x00007fdf42ddf358> (a java.lang.Object)
    at 
org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:114)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
{noformat}

>From one of the region server logs flushed has started and replay edits of 
>flush added then close requested then it got stuck in the transition and no 
>further processing of the requests on the region.
{noformat}
  2022-11-23 05:51:00,503 INFO  [MemStoreFlusher.1] 
regionserver.DefaultStoreFlusher: Flushed memstore data size=232.14 MB at 
sequenceid=29163855 (bloomFilter=true), 
to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/.tmp/cf1/c20f27a7407643b18558331d95f7a67f
  2022-11-23 05:51:00,530 INFO  [MemStoreFlusher.1] regionserver.HStore: Added 
hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/cf1/c20f27a7407643b18558331d95f7a67f,
 entries=231413, sequenceid=29163855, filesize=233.7 M
  2022-11-23 05:51:00,536 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Finished flush of dataSize ~232.14 MB/243420836, heapSize ~256.00 MB/268439136, 
currentSize=719.06 KB/736320 for 8f342bca97c115f8bce460a998e4afbc in 2146ms, 
sequenceid=29163855, compaction requested=true
  2022-11-23 05:51:00,541 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Flushing d92c7546be62225859dd641aa88992ea 1/1 column families, dataSize=232.62 
MB heapSize=256.53 MB
  2022-11-23 05:51:00,645 INFO  [MemStoreFlusher.0] 
regionserver.DefaultStoreFlusher: Flushed memstore data size=232.15 MB at 
sequenceid=29037517 (bloomFilter=true), 
to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/.tmp/cf1/eb542141078e4f879128498d01517e98
  2022-11-23 05:51:00,705 INFO  [MemStoreFlusher.0] regionserver.HStore: Added 
hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/cf1/eb542141078e4f879128498d01517e98,
 entries=231402, sequenceid=29037517, filesize=233.7 M
  2022-11-23 05:51:00,706 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
Finished flush of dataSize ~232.15 MB/243424206, heapSize ~256.01 MB/268442768, 
currentSize=935.82 KB/958277 for 7beadd786ffc23edc074238c873f800b in 2324ms, 
sequenceid=29037517, compaction requested=true
  2022-11-23 05:51:00,708 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
Flushing 6c6694dce02a13f8109ecc3dd70009d5 1/1 column families, dataSize=232.68 
MB heapSize=256.60 MB
  2022-11-23 05:51:00,771 INFO  
[RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0] 
handler.UnassignRegionHandler: Close 8f342bca97c115f8bce460a998e4afbc
{noformat}
{noformat}
2022-11-23 07:37:06,887 WARN  
[RpcServer.default.FPBQ.Fifo.handler=28,queue=8,port=16020] 
regionserver.HRegion: Region is too busy to allow lock acquisition.
org.apache.hadoop.hbase.RegionTooBusyException: Failed to obtain lock; 
regionName=hbase_perf_wl2_1000m,user8104,1668039117539.8f342bca97c115f8bce460a998e4afbc.,
 server=sl73tskrnsqln00107.visa.com,16020,1669131710569
        at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8726)
        at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8705)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8610)
        at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3161)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2985)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45517)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to