[ https://issues.apache.org/jira/browse/CASSANDRA-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119622#comment-13119622 ]
Hudson commented on CASSANDRA-3273: ----------------------------------- Integrated in Cassandra-0.8 #357 (See [https://builds.apache.org/job/Cassandra-0.8/357/]) Fix bug where the FailureDetector can take a very long time to mark a host down. Patch by brandonwilliams, reviewed by Paul Cannon for CASSANDRA-3273 brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178563 Files : * /cassandra/branches/cassandra-0.8/CHANGES.txt * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/FailureDetector.java * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/IFailureDetector.java * /cassandra/branches/cassandra-1.0.0/CHANGES.txt * /cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/gms/FailureDetector.java * /cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/gms/Gossiper.java * /cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/gms/IFailureDetector.java > FailureDetector can take a very long time to mark a host down > ------------------------------------------------------------- > > Key: CASSANDRA-3273 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3273 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Brandon Williams > Assignee: Brandon Williams > Fix For: 0.8.7, 1.0.0 > > Attachments: 3273.txt > > > There are two ways to trigger this: > * Bring a node up very briefly in a mixed-version cluster and then terminate > it > * Bring a node up, terminate it for a very long time, then bring it back up > and take it down again > In the first case, what can happen is a very short interval arrival time is > recorded by the versioning logic which requires reconnecting and can happen > very quickly. This can easily be solved by rejecting any intervals within a > reasonable bound, for instance the gossiper interval. > The second instance is harder to solve, because what is happening is that an > extremely large interval is recorded, which is the time the node was left > dead the first time. This throws off the mean of the intervals and causes it > to take a much longer time than it should to mark it down the second time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira