[ https://issues.apache.org/jira/browse/HBASE-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vikas Vishwakarma resolved HBASE-13418. --------------------------------------- Resolution: Duplicate Assignee: Vikas Vishwakarma Duplicate of HBASE-13592 > Regions getting stuck in PENDING_CLOSE state infinitely in high load HA > scenarios > --------------------------------------------------------------------------------- > > Key: HBASE-13418 > URL: https://issues.apache.org/jira/browse/HBASE-13418 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.10 > Reporter: Vikas Vishwakarma > Assignee: Vikas Vishwakarma > > In some heavy data load cases when there are multiple RegionServers going > up/down (HA) or when we try to shutdown/restart the entire HBase cluster, we > are observing that some regions are getting stuck in PENDING_CLOSE state > infinitely. > On going through the logs for a particular region stuck in PENDING_CLOSE > state, it looks like for this region two memstore flush got triggered within > few milliseconds as given below and after sometime there is Unrecoverable > exception while closing region. I am suspecting this could be some kind of > race condition but need to check further > Logs: > ================ > ...... > 2015-04-06 11:47:33,309 INFO [2,queue=0,port=60020] > regionserver.HRegionServer - Close 884fd5819112370d9a9834895b0ec19c, via > zk=yes, znode version=0, on > blitzhbase01-dnds1-4-crd.eng.sfdc.net,60020,1428318111711 > 2015-04-06 11:47:33,309 DEBUG [-dnds3-4-crd:60020-0] > handler.CloseRegionHandler - Processing close of > RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c. > 2015-04-06 11:47:33,319 DEBUG [-dnds3-4-crd:60020-0] regionserver.HRegion - > Closing > RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.: > disabling compactions & flushes > 2015-04-06 11:47:33,319 INFO [-dnds3-4-crd:60020-0] regionserver.HRegion - > Running close preflush of > RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c. > 2015-04-06 11:47:33,319 INFO [-dnds3-4-crd:60020-0] regionserver.HRegion - > Started memstore flush for > RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c., > current region memstore size 70.0 M > 2015-04-06 11:47:33,327 DEBUG [-dnds3-4-crd:60020-0] regionserver.HRegion - > Updates disabled for region > RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c. > 2015-04-06 11:47:33,328 INFO [-dnds3-4-crd:60020-0] regionserver.HRegion - > Started memstore flush for > RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c., > current region memstore size 70.0 M > 2015-04-06 11:47:33,328 WARN [-dnds3-4-crd:60020-0] wal.FSHLog - Couldn't > find oldest seqNum for the region we are about to flush: > [884fd5819112370d9a9834895b0ec19c] > 2015-04-06 11:47:33,328 WARN [-dnds3-4-crd:60020-0] regionserver.MemStore - > Snapshot called again without clearing previous. Doing nothing. Another > ongoing flush or did we fail last attempt? > 2015-04-06 11:47:33,334 FATAL [-dnds3-4-crd:60020-0] > regionserver.HRegionServer - ABORTING region server > blitzhbase01-dnds3-4-crd.eng.sfdc.net,60020,1428318082860: Unrecoverable > exception while closing region > RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c., > still finishing close -- This message was sent by Atlassian JIRA (v6.3.4#6332)