[ 
https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009849#comment-13009849
 ] 

stack commented on HBASE-3687:
------------------------------

bq. Shouldn't the RS not check in to the master with an RPC until it is 
available?

Check it out.  RS reports in then it finishes its startup.  Master could come 
in in meantime.  Thats how it currently works.  Could refactor start sequence 
but that'd be a bigger change.

bq. and weren't we just saying that we should not be putting in Thread.sleeps 

Yesterday, the 1 second sleep taking longer?  Should be set down for tests?  I 
could set this one down too.




> Bulk assign on startup should handle a ServerNotRunningException
> ----------------------------------------------------------------
>
>                 Key: HBASE-3687
>                 URL: https://issues.apache.org/jira/browse/HBASE-3687
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.90.2
>
>         Attachments: 3687.txt
>
>
> On startup, we do bulk assign.  At the moment, if any problem during bulk 
> assign, we consider startup failed and expectation is that you need to retry 
> (We need to make this better but that is not what this issue is about).  One 
> exception that we should handle is the case where a RS is slow coming up and 
> its rpc is not yet up listening.  In this case it will throw: 
> ServerNotRunningException.  We should retry at least this one exception 
> during bulk assign.
> We had this happen to us starting up a prod cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to