[ 
https://issues.apache.org/jira/browse/CASSANDRA-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119622#comment-13119622
 ] 

Hudson commented on CASSANDRA-3273:
-----------------------------------

Integrated in Cassandra-0.8 #357 (See 
[https://builds.apache.org/job/Cassandra-0.8/357/])
    Fix bug where the FailureDetector can take a very long time to mark a
host down.
Patch by brandonwilliams, reviewed by Paul Cannon for CASSANDRA-3273

brandonwilliams : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178563
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/FailureDetector.java
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/IFailureDetector.java
* /cassandra/branches/cassandra-1.0.0/CHANGES.txt
* 
/cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/gms/FailureDetector.java
* 
/cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/gms/Gossiper.java
* 
/cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/gms/IFailureDetector.java

                
> FailureDetector can take a very long time to mark a host down
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3273
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3273
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 0.8.7, 1.0.0
>
>         Attachments: 3273.txt
>
>
> There are two ways to trigger this:
> * Bring a node up very briefly in a mixed-version cluster and then terminate 
> it
> * Bring a node up, terminate it for a very long time, then bring it back up 
> and take it down again
> In the first case, what can happen is a very short interval arrival time is 
> recorded by the versioning logic which requires reconnecting and can happen 
> very quickly. This can easily be solved by rejecting any intervals within a 
> reasonable bound, for instance the gossiper interval.
> The second instance is harder to solve, because what is happening is that an 
> extremely large interval is recorded, which is the time the node was left 
> dead the first time.  This throws off the mean of the intervals and causes it 
> to take a much longer time than it should to mark it down the second time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to