number of executors was carrying meta

chunhui shen (JIRA) Mon, 07 Jan 2013 21:48:15 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546633#comment-13546633
 ]


chunhui shen commented on HBASE-3809:
-------------------------------------

I think it won't happen in trunk now.Because:
1.We use different ExecutorService to execute ServerShutdownHandler and 
MetaServerShutdownHandler
2.In the process of MetaServerShutdownHandler
{code}
if (isCarryingRoot() || isCarryingMeta() // -ROOT- or .META.
          || !services.getAssignmentManager().isFailoverCleanupDone()) {
        this.services.getServerManager().processDeadServer(serverName);
        return;
      }
{code}

It means MetaServerShutdownHandler could always be executed, so this stuck 
scenario won't happen again 
                
> .META. may not come back online if > number of executors servers crash and 
> one of those > number of executors was carrying meta
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3809
>                 URL: https://issues.apache.org/jira/browse/HBASE-3809
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.96.0
>
>
> This is a duplicate of another issue but at the moment I cannot find the 
> original.
> If you had a 700 node cluster and then you ran something on the cluster which 
> killed 100 nodes, and .META. had been running on one of those downed nodes, 
> well, you'll have all of your master executors processing ServerShutdowns and 
> more than likely non of the currently processing executors will be servicing 
> the shutdown of the server that was carrying .META.
> Well, for server shutdown to complete at the moment, an online .META. is 
> required.  So, in the above case, we'll be stuck. The current executors will 
> not be able to clear to make space for the processing of the server carrying 
> .META. because they need .META. to complete.
> We can make the master handlers have no bound so it will expand to accomodate 
> all crashed servers -- so it'll have the one .META. in its queue -- or we can 
> change it so shutdown handling doesn't require .META. to be on-line (its used 
> to figure the regions the server was carrying); we could use the master's 
> in-memory picture of the cluster (But IIRC, there may be holes ....TBD)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3809) .META. may not come back online if > number of executors servers crash and one of those > number of executors was carrying meta

Reply via email to