[ 
https://issues.apache.org/jira/browse/RATIS-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated RATIS-428:
------------------------------------
    Description: 
Failure detectors are an important piece in designing robust distributed 
systems. Components must be expected to fail, and the rest of the system should 
either continue functioning properly (ideal) or at the very least degrade 
gracefully instead of crashing or becoming corrupted. Because of the unreliable 
nature of communication over networks, however, detecting that a node has 
failed is a nontrivial task. The *phi accrual failure detector* is a popular 
choice for solving this problem, as it provides a good balance of flexibility 
and adaptability to different network conditions. It is used successfully in 
several real-world distributed systems, such as Apache Cassandra (see here) and 
Akka clusters (see here), and also has a Node.js implementation.

[Original 
paper|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.7427&rep=rep1&type=pdf]

  was:
Failure detectors are an important piece in designing robust distributed 
systems. Components must be expected to fail, and the rest of the system should 
either continue functioning properly (ideal) or at the very least degrade 
gracefully instead of crashing or becoming corrupted. Because of the unreliable 
nature of communication over networks, however, detecting that a node has 
failed is a nontrivial task. The *phi accrual failure detector* is a popular 
choice for solving this problem, as it provides a good balance of flexibility 
and adaptability to different network conditions. It is used successfully in 
several real-world distributed systems, such as Apache Cassandra (see here) and 
Akka clusters (see here), and also has a Node.js implementation.

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.7427&rep=rep1&type=pdf|Original
 paper]


> Phi accrual failure detector
> ----------------------------
>
>                 Key: RATIS-428
>                 URL: https://issues.apache.org/jira/browse/RATIS-428
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Major
>
> Failure detectors are an important piece in designing robust distributed 
> systems. Components must be expected to fail, and the rest of the system 
> should either continue functioning properly (ideal) or at the very least 
> degrade gracefully instead of crashing or becoming corrupted. Because of the 
> unreliable nature of communication over networks, however, detecting that a 
> node has failed is a nontrivial task. The *phi accrual failure detector* is a 
> popular choice for solving this problem, as it provides a good balance of 
> flexibility and adaptability to different network conditions. It is used 
> successfully in several real-world distributed systems, such as Apache 
> Cassandra (see here) and Akka clusters (see here), and also has a Node.js 
> implementation.
> [Original 
> paper|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.7427&rep=rep1&type=pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to