[
https://issues.apache.org/jira/browse/GOSSIP-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924264#comment-15924264
]
Maxim Rusak edited comment on GOSSIP-74 at 3/14/17 2:18 PM:
------------------------------------------------------------
Edward Capriolo: I can't understand your point for 1)
If we make heartbeats at 1,2,3,4 seconds current solution will keep (1,2,3) but
we need (1,1,1).
This leads to very-very high variation instead of 0 variation in this case. So
FailureDetector become truly indifferent to anything.
I showed you in example. 100 heartbeats, 1 per second lead to data
(1,2,3,...,99) while real variation is 0. Then FailureDetector accept dead
machine for 2000seconds!
was (Author: makrusak):
Edward Capriolo: I can't understand your point for 1)
If we make heartbeats at 1,2,3,4 seconds current solution will keep (1,2,3) but
we need (1,1,1).
This leads to very-very high variation instead of 0 variation in this case. So
FailureDetector become truly indifferent to anything.
I showed you in example. 100 heartbeats, 1 per second lead to data
(1,2,3,...,99) while real variation is 0.
> Critical bugs in FailureDetector
> --------------------------------
>
> Key: GOSSIP-74
> URL: https://issues.apache.org/jira/browse/GOSSIP-74
> Project: Gossip
> Issue Type: Bug
> Reporter: Maxim Rusak
> Assignee: Maxim Rusak
>
> Now FailureDetector have (at least) 2 bugs (in comparation to original paper):
> 1. latestHeartbeatMs don't update on each HeartBeat. So we have
> descriptiveStatistics consisted not from deltas between heartbeats but from
> time periods from first heartbeats.
> 2. when we create normalDistribution we pass variation, not standard
> deviation.
> They make FailureDetector totally indifferent due to extremely high deviation.
> Example: http://pastebin.com/xaeF52PP
> Here we send 100 heartbeats, one per second(for example), then we check the
> state after 2000 seconds, and comparing to threshold it's still alive.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)