[jira] [Commented] (IGNITE-25324) Increase scalecube failure detection timeouts

Roman Puchkovskiy (Jira) Fri, 30 May 2025 07:23:20 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-25324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955235#comment-17955235
 ]


Roman Puchkovskiy commented on IGNITE-25324:
--------------------------------------------

Suspicion timeout is calculated as multiplier * pingInterval * 
ceil(log2(clusterSize + 1)). Default multiplier is 5, pingInterval is 1000ms, so
 * For 2-3 nodes the suspicion timeout is 10 seconds
 * For 4-7 it's 15 seconds
 * For 8-15 it's 20 seconds

In Apache Ignite 2, default failure detection timeout is 30 seconds (for any 
cluster size). Given that a typical cluster size is around 7-10 nodes, it makes 
sense to raise pingInterval to 2 seconds, so that for 2-3 nodes the timeout is 
20 seconds, for 4-7 - 30sec, for 8-15 - 40 sec.

> Increase scalecube failure detection timeouts
> ---------------------------------------------
>
>                 Key: IGNITE-25324
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25324
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>
> Default timeouts seem to be too short as clusters sometimes fall apart under 
> load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-25324) Increase scalecube failure detection timeouts

Reply via email to