[ https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-22081: ------------------------------------- Status: Patch Available (was: Open) Tiny patch; also removed an unused method > master shutdown: close RpcServer first thing, close procWAL as soon as > viable, and delete znode the last thing > -------------------------------------------------------------------------------------------------------------- > > Key: HBASE-22081 > URL: https://issues.apache.org/jira/browse/HBASE-22081 > Project: HBase > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Priority: Major > Attachments: HBASE-22081.patch > > > I had a master get stuck due to HBASE-22079 and noticed it was logging RS > abort messages during shutdown. > [~bahramch] found some issues where messages are processed by old master > during shutdown due to a race condition in RS cache (or it could also happen > due to a network race). > Previously I found some bug where SCP was created during master shutdown that > had incorrect state (because some structures already got cleaned). > I think before master fencing is implemented we can at least make these > issues much less likely by thinking about shutdown order. > 1) First kill RCP server so we don't receive any more messages. There's no > need to receive messages when we are shutting down. Server heartbeats could > be impacted I guess, but I don't think they will be cause we currently only > kill RS on ZK timeout. > 2) Then do whatever cleanup we think is needed that requires proc wal. > 3) Then close proc WAL so no errant threads can create more procs. > 4) Then do whatever other cleanup. > 5) Finally delete znode. > Right now znode is deleted somewhat early I think, and RpcServer is closed > very late. -- This message was sent by Atlassian JIRA (v7.6.3#76005)