[ https://issues.apache.org/jira/browse/HBASE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547378#comment-13547378 ]
Ted Yu commented on HBASE-3809: ------------------------------- I was trying to find out which other ExecutorService is used to execute MetaServerShutdownHandler. In MasterServices, there is only one method returning ExecutorService: {code} public ExecutorService getExecutorService(); {code} In HMaster, I only found one ExecutorService member variable: {code} // Instance of the hbase executor service. ExecutorService executorService; {code} > .META. may not come back online if > number of executors servers crash and > one of those > number of executors was carrying meta > ------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-3809 > URL: https://issues.apache.org/jira/browse/HBASE-3809 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: chunhui shen > Priority: Critical > > This is a duplicate of another issue but at the moment I cannot find the > original. > If you had a 700 node cluster and then you ran something on the cluster which > killed 100 nodes, and .META. had been running on one of those downed nodes, > well, you'll have all of your master executors processing ServerShutdowns and > more than likely non of the currently processing executors will be servicing > the shutdown of the server that was carrying .META. > Well, for server shutdown to complete at the moment, an online .META. is > required. So, in the above case, we'll be stuck. The current executors will > not be able to clear to make space for the processing of the server carrying > .META. because they need .META. to complete. > We can make the master handlers have no bound so it will expand to accomodate > all crashed servers -- so it'll have the one .META. in its queue -- or we can > change it so shutdown handling doesn't require .META. to be on-line (its used > to figure the regions the server was carrying); we could use the master's > in-memory picture of the cluster (But IIRC, there may be holes ....TBD) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira