[ 
https://issues.apache.org/jira/browse/CASSANDRA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070726#comment-13070726
 ] 

Brandon Williams commented on CASSANDRA-2947:
---------------------------------------------

bq. The problem is that the failure detector never learns about node B - 
FD.report is never called for B.

This isn't quite right, the FD knows about B and is still calculating phi for 
it, but it is never reported for some reason:

{noformat}
TRACE 20:12:12,201 Performing status check ...
TRACE 20:12:12,201 PHI for /10.179.111.137 : 19.923693651793577
TRACE 20:12:12,201 PHI for /10.179.65.102 : 0.43267044519934433
{noformat}

> New nodes always think dead nodes are alive
> -------------------------------------------
>
>                 Key: CASSANDRA-2947
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2947
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>            Reporter: Richard Low
>         Attachments: 2947.txt
>
>
> If a new node is brought up while a node is down, it will think it is up 
> forever.
> To reproduce:
> Take nodes A, B and C.
> 1. Bring up nodes A and B in a cluster
> 2. Take down B and wait for A to mark it as down
> 3. Bring up C with A as a seed
> 4. nodetool ring on C shows all 3 nodes as up and never marks B as down
> The problem is that the failure detector never learns about node B - 
> FD.report is never called for B.  This means requests are constantly routed 
> to B from C and timeout, but they should fail with UnavailableException.
> The attached (hack) patch appears to fix it, but I expect the problem is 
> actually elsewhere in the gossip code.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to