[ https://issues.apache.org/jira/browse/CASSANDRA-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-6385: -------------------------------------- Attachment: 6385-v3.txt Thinking about it more, I think the main problem is using too low of an initial value to seed the Window. Interval / 2 is always smaller then the actual mean will be, and it will be increasingly too small as the cluster size grows. Picking a nice large value there gives us the "large fudge to start that "decays" (by being averaged with real values) as we get more data" behavior that we want. v3 attached. > FD phi estimator initial conditions > ----------------------------------- > > Key: CASSANDRA-6385 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6385 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Quentin Conner > Fix For: 1.2.13, 2.0.3 > > Attachments: 6385-v2.txt, 6385-v3.txt, 6385.txt > > > phi estimates are calculated for newly discovered nodes from an un-filled > (new, uninitialized) deque. > The inter-arrival time (elapsed time between gossip heartbeats) is stored in > the o.a.c.gms.ArrivalWindow.arrivalIntervale deque for each received > heartbeat, up to the maximum window size of 1000 samples. > In the o.a.c.gms.FailureDetector.interpret() method, phi is calculated for > the node which uses a statistical measure called variance. Like mean, > variance on a population (a set of numbers or measurements) is not > statistically relevant unless the population set size is 30 or greater. > When a new node is discovered, the calculated variance is higher than normal, > and causes phi to be higher than normal, resulting in a false positive > failure detection. -- This message was sent by Atlassian JIRA (v6.1#6144)