[
https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maksim Timonin updated IGNITE-25700:
------------------------------------
Description:
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes
fails. The test checks that during failure detection timeout there are 15-25
ping messages between two neighbors nodes. But sometimes test fails because
there are only 14 messages.
Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending
of the message resets a timer used for sending the
TcpDiscoveryConnectionCheckMessage message.
Then in the default failure detection period (10000ms) it can be sent 6 times.
It's OK to await 20 times of the check message, but the reset can drop 6 sends,
and we will get 14 messages.
Measuring the number of messages is a very unstable practice. We should make
the boundaries more moderate.
was:
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes
fails. The test checks that during failure detection timeout there are 15-25
ping messages between two neighbors nodes. But sometimes test fails because
there are only 14 messages.
Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending
of the message resets a timer used for sending the
TcpDiscoveryConnectionCheckMessage message.
Then in the default failure detection period (10000ms) it can be sent 6 times.
It's OK to await 20 times of the check message, but the reset can drop 6 sends,
and we will get 14 messages.
Measuring amount of messages is very unstable practice.
> Fix flaky test
> TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
> -------------------------------------------------------------------------------
>
> Key: IGNITE-25700
> URL: https://issues.apache.org/jira/browse/IGNITE-25700
> Project: Ignite
> Issue Type: Bug
> Reporter: Maksim Timonin
> Assignee: Maksim Timonin
> Priority: Major
> Labels: IEP-132, ise
> Fix For: 2.18
>
>
> Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
> sometimes fails. The test checks that during failure detection timeout there
> are 15-25 ping messages between two neighbors nodes. But sometimes test
> fails because there are only 14 messages.
> Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending
> of the message resets a timer used for sending the
> TcpDiscoveryConnectionCheckMessage message.
> Then in the default failure detection period (10000ms) it can be sent 6
> times. It's OK to await 20 times of the check message, but the reset can drop
> 6 sends, and we will get 14 messages.
> Measuring the number of messages is a very unstable practice. We should make
> the boundaries more moderate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)