[ 
https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Timonin updated IGNITE-25700:
------------------------------------
    Description: 
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes 
fails. The test awaits 15-25 TcpDiscoveryConnectionCheckMessage during failure 
detection timeout (10000ms). But sometimes test fails because there are only 14 
messages.

TcpDiscoveryConnectionCheckMessage is sent by timeout (500ms) if no other 
messages were sent in the period. The test expects if no activity on the 
cluster then ~20 msgs should be sent. But actually there is 
TcpDiscoveryMetricsUpdateMessage is sent every 2000ms.

Within the failure detection period it can be sent 6 times, and then reset the 
timer 6 times. In corner case (20 - 6 = 14) msgs of the check message will be 
sent.

  was:
Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes 
fails. The test checks that during failure detection timeout there are 15-25 
ping messages  between two neighbors nodes. But sometimes test fails because 
there are only 14 messages.

Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending 
of the message resets a timer used for sending the 
TcpDiscoveryConnectionCheckMessage message.

Then in the default failure detection period (10000ms) it can be sent 6 times. 
It's OK to await 20 times of the check message, but the reset can drop 6 sends, 
and we will get 14 messages.

Measuring the number of messages is a very unstable practice. We should make 
the boundaries more moderate.


> Fix flaky test 
> TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
> -------------------------------------------------------------------------------
>
>                 Key: IGNITE-25700
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25700
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Maksim Timonin
>            Assignee: Maksim Timonin
>            Priority: Major
>              Labels: IEP-132, ise
>             Fix For: 2.18
>
>
> Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage 
> sometimes fails. The test awaits 15-25 TcpDiscoveryConnectionCheckMessage 
> during failure detection timeout (10000ms). But sometimes test fails because 
> there are only 14 messages.
> TcpDiscoveryConnectionCheckMessage is sent by timeout (500ms) if no other 
> messages were sent in the period. The test expects if no activity on the 
> cluster then ~20 msgs should be sent. But actually there is 
> TcpDiscoveryMetricsUpdateMessage is sent every 2000ms.
> Within the failure detection period it can be sent 6 times, and then reset 
> the timer 6 times. In corner case (20 - 6 = 14) msgs of the check message 
> will be sent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to