[ https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656610#comment-16656610 ]
Guanghao Zhang commented on HBASE-21325: ---------------------------------------- waitOnAllRegionsToClose take 5 minutes in my ut. It is long...... So maybe an absolute exit time is needed. {code:java} 2018-10-19 18:14:17,998 INFO [RS:1;hao-OptiPlex-7050:35511] regionserver.HRegionServer(1403): Waiting on 3 regions to close 2018-10-19 18:14:17,999 DEBUG [RS:1;hao-OptiPlex-7050:35511] regionserver.HRegionServer(1407): Online Regions={df9e0450a6653aaff9224f5e41e03213=hbase:namespace,,1539943969916.df9e0450a6653aaff9224f5e41e03213., 1588230740=hbase:meta,,1.1588230740, 285d5c537ad7485b96d372ea0bbbf5df=SyncRep,,1539943991041.285d5c537ad7485b96d372ea0bbbf5df.} 2018-10-19 18:14:19,001 INFO [RS:1;hao-OptiPlex-7050:35511] regionserver.HRegionServer(1403): Waiting on 1 regions to close 2018-10-19 18:14:19,001 DEBUG [RS:1;hao-OptiPlex-7050:35511] regionserver.HRegionServer(1407): Online Regions={285d5c537ad7485b96d372ea0bbbf5df=SyncRep,,1539943991041.285d5c537ad7485b96d372ea0bbbf5df.} 2018-10-19 18:19:18,488 INFO [RS:1;hao-OptiPlex-7050:35511] regionserver.HRegionServer(1426): We were exiting though online regions are not empty, because some regions failed closing 2018-10-19 18:19:18,488 INFO [RS:1;hao-OptiPlex-7050:35511] regionserver.HRegionServer(1101): stopping server hao-optiplex-7050,35511,1539943965172; all regions closed. {code} > Add a max wait time for waitOnAllRegionsToClose > ----------------------------------------------- > > Key: HBASE-21325 > URL: https://issues.apache.org/jira/browse/HBASE-21325 > Project: HBase > Issue Type: Improvement > Reporter: Duo Zhang > Assignee: Guanghao Zhang > Priority: Major > > When testing sync replication, I found that, if I transit the remote cluster > to DA, while the local cluster is still in A, the region server will hang > when shutdown. As the fsOk flag only test the local cluster(which is > reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is > broken(the remote wal directory is gone) so we will never succeed. And this > lead to an infinite wait inside waitOnAllRegionsToClose. > So I think here we should have an upper bound for the wait time in > waitOnAllRegionsToClose method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)