[ https://issues.apache.org/jira/browse/HBASE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546633#comment-13546633 ]
chunhui shen commented on HBASE-3809: ------------------------------------- I think it won't happen in trunk now.Because: 1.We use different ExecutorService to execute ServerShutdownHandler and MetaServerShutdownHandler 2.In the process of MetaServerShutdownHandler {code} if (isCarryingRoot() || isCarryingMeta() // -ROOT- or .META. || !services.getAssignmentManager().isFailoverCleanupDone()) { this.services.getServerManager().processDeadServer(serverName); return; } {code} It means MetaServerShutdownHandler could always be executed, so this stuck scenario won't happen again > .META. may not come back online if > number of executors servers crash and > one of those > number of executors was carrying meta > ------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-3809 > URL: https://issues.apache.org/jira/browse/HBASE-3809 > Project: HBase > Issue Type: Bug > Reporter: stack > Priority: Critical > Fix For: 0.96.0 > > > This is a duplicate of another issue but at the moment I cannot find the > original. > If you had a 700 node cluster and then you ran something on the cluster which > killed 100 nodes, and .META. had been running on one of those downed nodes, > well, you'll have all of your master executors processing ServerShutdowns and > more than likely non of the currently processing executors will be servicing > the shutdown of the server that was carrying .META. > Well, for server shutdown to complete at the moment, an online .META. is > required. So, in the above case, we'll be stuck. The current executors will > not be able to clear to make space for the processing of the server carrying > .META. because they need .META. to complete. > We can make the master handlers have no bound so it will expand to accomodate > all crashed servers -- so it'll have the one .META. in its queue -- or we can > change it so shutdown handling doesn't require .META. to be on-line (its used > to figure the regions the server was carrying); we could use the master's > in-memory picture of the cluster (But IIRC, there may be holes ....TBD) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira