[ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656610#comment-16656610
 ] 

Guanghao Zhang commented on HBASE-21325:
----------------------------------------

waitOnAllRegionsToClose take 5 minutes in my ut. It is long...... So maybe an 
absolute exit time is needed.
{code:java}
2018-10-19 18:14:17,998 INFO  [RS:1;hao-OptiPlex-7050:35511] 
regionserver.HRegionServer(1403): Waiting on 3 regions to close
2018-10-19 18:14:17,999 DEBUG [RS:1;hao-OptiPlex-7050:35511] 
regionserver.HRegionServer(1407): Online 
Regions={df9e0450a6653aaff9224f5e41e03213=hbase:namespace,,1539943969916.df9e0450a6653aaff9224f5e41e03213.,
 1588230740=hbase:meta,,1.1588230740, 
285d5c537ad7485b96d372ea0bbbf5df=SyncRep,,1539943991041.285d5c537ad7485b96d372ea0bbbf5df.}
2018-10-19 18:14:19,001 INFO  [RS:1;hao-OptiPlex-7050:35511] 
regionserver.HRegionServer(1403): Waiting on 1 regions to close
2018-10-19 18:14:19,001 DEBUG [RS:1;hao-OptiPlex-7050:35511] 
regionserver.HRegionServer(1407): Online 
Regions={285d5c537ad7485b96d372ea0bbbf5df=SyncRep,,1539943991041.285d5c537ad7485b96d372ea0bbbf5df.}
2018-10-19 18:19:18,488 INFO  [RS:1;hao-OptiPlex-7050:35511] 
regionserver.HRegionServer(1426): We were exiting though online regions are not 
empty, because some regions failed closing
2018-10-19 18:19:18,488 INFO  [RS:1;hao-OptiPlex-7050:35511] 
regionserver.HRegionServer(1101): stopping server 
hao-optiplex-7050,35511,1539943965172; all regions closed.
{code}


> Add a max wait time for waitOnAllRegionsToClose
> -----------------------------------------------
>
>                 Key: HBASE-21325
>                 URL: https://issues.apache.org/jira/browse/HBASE-21325
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>
> When testing sync replication, I found that, if I transit the remote cluster 
> to DA, while the local cluster is still in A, the region server will hang 
> when shutdown. As the fsOk flag only test the local cluster(which is 
> reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is 
> broken(the remote wal directory is gone)  so we will never succeed. And this 
> lead to an infinite wait inside waitOnAllRegionsToClose.
> So I think here we should have an upper bound for the wait time in 
> waitOnAllRegionsToClose method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to