[ 
https://issues.apache.org/jira/browse/HBASE-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002348#comment-13002348
 ] 

Jean-Daniel Cryans commented on HBASE-3580:
-------------------------------------------

In DeadServer.remove, don't decrement numprocessing since it's already done way 
before that.

The new DeadServer.isDeadServerComingBackAlive doesn't look right. It says the 
server name can be passed as either form, but then tells HSI.isServer that it's 
passing hostAndPortOnly.

In ServerManager, I don't think you should use 2 methods... it looks more 
confusing than it should be:

{code}
+    if (!IsDeadServerComingBackAlive(serverName)) {
+      if (!this.deadservers.isDeadServer(serverName)) return;
{code}

BTW don't use upper case for the first letter of the first word in the method 
name.

Also this won't work:

{code}
this.deadservers.remove(serverName);
{code}

Since it's the full server name (host,port,startcode) and we check if it's not 
dead, then it definitely shouldn't be in there!

What should be done instead is checking if the full servername is in 
DeadServer, if not then also check if its host+port form is in there and if it 
is then delete it.

> Remove RS from DeadServer when new instance checks in
> -----------------------------------------------------
>
>                 Key: HBASE-3580
>                 URL: https://issues.apache.org/jira/browse/HBASE-3580
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.2
>
>         Attachments: 
> HBASE-3580-Remove-RS-from-DeadServer-when-new-instance-checks-in.patch
>
>
> Keeping the servers in DeadServer until it reaches some maximum isn't super 
> friendly, it confuses even the best of our users:
> {quote}
> 09:27 < gbowyer> Hi all, I have apparently three dead RS in my cluster, I 
> cannot find references to them in HDFS or in ZK, how do I still report dead RS
> 09:27 < gbowyer> also the same nodes are reported as live region servers
> {quote}
> The subtil startcode difference can be hard to catch, also this behavior 
> differs from 0.20 (so old users get confused, like I did when debugging this 
> problem) and it also differs from Hadoop's handling of dead DataNodes. It was 
> introduced in HBASE-3282.
> I think this should be improved by doing like Hadoop does, removing the RS 
> from DeadServers when a new instance with the same hostname+port checks in. 
> Stack says we should do it in ServerManager.checkIsDead

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to