[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage

Maksim Timonin (Jira) Tue, 17 Jun 2025 11:52:04 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Maksim Timonin updated IGNITE-25700:
------------------------------------
    Description: 
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes 
fails. The test checks that during failure detection timeout there are 15-25 
ping messages  between two neighbors nodes. But sometimes test fails because 
there are only 14 messages.

Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending 
of the message resets a timer used for sending the 
TcpDiscoveryConnectionCheckMessage message.

Then in the default failure detection period (10000ms) it can be sent 6 times. 
It's OK to await 20 times of the check message, but the reset can drop 6 sends, 
and we will get 14 messages.

Measuring the number of messages is a very unstable practice. We should make 
the boundaries more moderate.

  was:
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes 
fails. The test checks that during failure detection timeout there are 15-25 
ping messages  between two neighbors nodes. But sometimes test fails because 
there are only 14 messages.

Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending 
of the message resets a timer used for sending the 
TcpDiscoveryConnectionCheckMessage message.

Then in the default failure detection period (10000ms) it can be sent 6 times. 
It's OK to await 20 times of the check message, but the reset can drop 6 sends, 
and we will get 14 messages.

Measuring amount of messages is very unstable practice.


> Fix flaky test 
> TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
> -------------------------------------------------------------------------------
>
>                 Key: IGNITE-25700
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25700
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Maksim Timonin
>            Assignee: Maksim Timonin
>            Priority: Major
>              Labels: IEP-132, ise
>             Fix For: 2.18
>
>
> Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage 
> sometimes fails. The test checks that during failure detection timeout there 
> are 15-25 ping messages  between two neighbors nodes. But sometimes test 
> fails because there are only 14 messages.
> Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending 
> of the message resets a timer used for sending the 
> TcpDiscoveryConnectionCheckMessage message.
> Then in the default failure detection period (10000ms) it can be sent 6 
> times. It's OK to await 20 times of the check message, but the reset can drop 
> 6 sends, and we will get 14 messages.
> Measuring the number of messages is a very unstable practice. We should make 
> the boundaries more moderate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage

Reply via email to