[
https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maksim Timonin updated IGNITE-25700:
------------------------------------
Description:
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes
fails. The test awaits 15-25 TcpDiscoveryConnectionCheckMessage during failure
detection timeout (10000ms). But sometimes test fails because there are only 14
messages.
TcpDiscoveryConnectionCheckMessage is sent by timeout (500ms) if no other
messages were sent in the period. The test expects if no activity on the
cluster then ~20 msgs should be sent. But actually there is
TcpDiscoveryMetricsUpdateMessage is sent every 2000ms.
Within the failure detection period it can be sent 6 times, and then reset the
timer 6 times. In corner case (20 - 6 = 14) msgs of the check message will be
sent.
was:
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes
fails. The test checks that during failure detection timeout there are 15-25
ping messages between two neighbors nodes. But sometimes test fails because
there are only 14 messages.
Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending
of the message resets a timer used for sending the
TcpDiscoveryConnectionCheckMessage message.
Then in the default failure detection period (10000ms) it can be sent 6 times.
It's OK to await 20 times of the check message, but the reset can drop 6 sends,
and we will get 14 messages.
Measuring the number of messages is a very unstable practice. We should make
the boundaries more moderate.
> Fix flaky test
> TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
> -------------------------------------------------------------------------------
>
> Key: IGNITE-25700
> URL: https://issues.apache.org/jira/browse/IGNITE-25700
> Project: Ignite
> Issue Type: Bug
> Reporter: Maksim Timonin
> Assignee: Maksim Timonin
> Priority: Major
> Labels: IEP-132, ise
> Fix For: 2.18
>
>
> Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
> sometimes fails. The test awaits 15-25 TcpDiscoveryConnectionCheckMessage
> during failure detection timeout (10000ms). But sometimes test fails because
> there are only 14 messages.
> TcpDiscoveryConnectionCheckMessage is sent by timeout (500ms) if no other
> messages were sent in the period. The test expects if no activity on the
> cluster then ~20 msgs should be sent. But actually there is
> TcpDiscoveryMetricsUpdateMessage is sent every 2000ms.
> Within the failure detection period it can be sent 6 times, and then reset
> the timer 6 times. In corner case (20 - 6 = 14) msgs of the check message
> will be sent.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)