[ 
https://issues.apache.org/jira/browse/IGNITE-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-19255:
-----------------------------------------
    Description: 
In IGNITE-19105 I've changed some internal shenanigans of the 
MetaStorageManager (without affecting its API in any way). After that, nearly 
all unit tests in the {{distribution-zones}} module started to fail. Turns out 
it happened because of extensive mock usages that emulate behavior of the Meta 
Storage. So I decided to replace it with the {{StandaloneMetaStorageManager}} 
implementation and all hell broke loose: many tests emulate Meta Storage 
incorrectly, a lot of races appeared, because many methods became truly 
asynchronous.

This situation is very frustrating: a different component internals were 
changed with no API changes and a completely unrelated module is not longer 
able to pass its tests. Though I fixed most of the failures, some tests are 
still failing and I'm going to try to describe, what's wrong with them:

*{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationAfterScaleUpTriggeredOnNewCluster}}*
 - this test tests a scenario when we start a node after logical topology was 
updated. I don't know how realistic is this scenario, but the problem is that 
"data nodes" don't get populated with the logical topology nodes on 
{{distributionZoneManager}} start, because {{scheduleTimers}} method, that 
get's invoked from the Meta Storage Watch, doesn't go inside the {{if 
(!addedNodes.isEmpty() && autoAdjustScaleUp != INFINITE_TIMER_VALUE)}} branch.

*{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationForDefaultZoneAfterScaleUpTriggered}}*
 - same issue as above.

*{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationForDefaultZoneAfterScaleDownTriggered}}*
 - same issue as above.

*{{DistributionZoneManagerScaleUpTest#testUpdateZoneScaleUpTriggersDataNodePropagation}}*
 - this test fails with the following assertion error: {_}Expected revision 
that is greater or equal to already seen meta storage events.{_}. This is 
because TestConfigurationStorage does not use the same revision as the Meta 
Storage, therefore their revisions can't be compared directly. This should 
either be converted to an integration test or it should use 
`DistributedConfigurationStrorage` instead.

*{{DistributionZoneManagerScaleUpTest#testUpdateZoneScaleDownTriggersDataNodePropagation}}*
 - same issue as above.

*{{DistributionZoneManagerScaleUpTest#testDropZoneDoNotPropagateDataNodesAfterScaleUp}}*
 - this test is flaky, because notifications from test configuration storage 
and from Meta Storage Watches are not related to each other (unlike real-life 
Distributed Configuration Storage which is built on top of Watches), so 
notifications from the configuration storage and Meta Storage can arrive in a 
undetermined order.

*{{DistributionZoneManagerScaleUpTest#testDropZoneDoNotPropagateDataNodesAfterScaleDown}}*
 - same issue as above.

*{{DistributionZoneManagerWatchListenerTest#testDataNodesOfDefaultZoneUpdatedOnWatchListenerEvent}}*
 - this test is flaky, probably due to some races between Watch and 
Configuration Listener execution (sometimes a retry on {{invoke}} happens and 
{{Mockito#verify}} fails).

 

*New tests* from [https://github.com/gridgain/apache-ignite-3/tree/ignite-18756]

*DistributionZoneAwaitDataNodesTest#testRemoveZoneWhileAwaitingDataNodes* - 
this test must remove the zone after MetastorageTopologyListener updates the 
topVerTracker and before 
MetastorageDataNodesListener updates 
scaleUpRevisionTracker/scaleDownRevisionTracker. Now it's impossible to do it 
with StandaloneMetaStorageManager.
*DistributionZoneAwaitDataNodesTest#testScaleUpScaleDownAreChangedWhileAwaitingDataNodes*
 - same issue as above but here we need to update scaleUp and scaleDown instead 
of removing the zone.

  was:
In IGNITE-19105 I've changed some internal shenanigans of the 
MetaStorageManager (without affecting its API in any way). After that, nearly 
all unit tests in the {{distribution-zones}} module started to fail. Turns out 
it happened because of extensive mock usages that emulate behavior of the Meta 
Storage. So I decided to replace it with the {{StandaloneMetaStorageManager}} 
implementation and all hell broke loose: many tests emulate Meta Storage 
incorrectly, a lot of races appeared, because many methods became truly 
asynchronous.

This situation is very frustrating: a different component internals were 
changed with no API changes and a completely unrelated module is not longer 
able to pass its tests. Though I fixed most of the failures, some tests are 
still failing and I'm going to try to describe, what's wrong with them:

*{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationAfterScaleUpTriggeredOnNewCluster}}*
 - this test tests a scenario when we start a node after logical topology was 
updated. I don't know how realistic is this scenario, but the problem is that 
"data nodes" don't get populated with the logical topology nodes on 
{{distributionZoneManager}} start, because {{scheduleTimers}} method, that 
get's invoked from the Meta Storage Watch, doesn't go inside the {{if 
(!addedNodes.isEmpty() && autoAdjustScaleUp != INFINITE_TIMER_VALUE)}} branch.

*{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationForDefaultZoneAfterScaleUpTriggered}}*
 - same issue as above.

*{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationForDefaultZoneAfterScaleDownTriggered}}*
 - same issue as above.

*{{DistributionZoneManagerScaleUpTest#testUpdateZoneScaleUpTriggersDataNodePropagation}}*
 - this test fails with the following assertion error: {_}Expected revision 
that is greater or equal to already seen meta storage events.{_}. This is 
because TestConfigurationStorage does not use the same revision as the Meta 
Storage, therefore their revisions can't be compared directly. This should 
either be converted to an integration test or it should use 
`DistributedConfigurationStrorage` instead.

*{{DistributionZoneManagerScaleUpTest#testUpdateZoneScaleDownTriggersDataNodePropagation}}*
 - same issue as above.

*{{DistributionZoneManagerWatchListenerTest#testDataNodesOfDefaultZoneUpdatedOnWatchListenerEvent}}*
 - this test is flaky, probably due to some races between Watch and 
Configuration Listener execution (sometimes a retry on {{invoke}} happens and 
{{Mockito#verify}} fails).

 

*New tests* from [https://github.com/gridgain/apache-ignite-3/tree/ignite-18756]

*DistributionZoneAwaitDataNodesTest#testRemoveZoneWhileAwaitingDataNodes* - 
this test must remove the zone after MetastorageTopologyListener updates the 
topVerTracker and before 
MetastorageDataNodesListener updates 
scaleUpRevisionTracker/scaleDownRevisionTracker. Now it's impossible to do it 
with StandaloneMetaStorageManager.
*DistributionZoneAwaitDataNodesTest#testScaleUpScaleDownAreChangedWhileAwaitingDataNodes*
 - same issue as above but here we need to update scaleUp and scaleDown instead 
of removing the zone.


> Fix broken unit tests in distribution-zones module
> --------------------------------------------------
>
>                 Key: IGNITE-19255
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19255
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Aleksandr Polovtcev
>            Assignee: Mirza Aliev
>            Priority: Blocker
>              Labels: ignite-3
>
> In IGNITE-19105 I've changed some internal shenanigans of the 
> MetaStorageManager (without affecting its API in any way). After that, nearly 
> all unit tests in the {{distribution-zones}} module started to fail. Turns 
> out it happened because of extensive mock usages that emulate behavior of the 
> Meta Storage. So I decided to replace it with the 
> {{StandaloneMetaStorageManager}} implementation and all hell broke loose: 
> many tests emulate Meta Storage incorrectly, a lot of races appeared, because 
> many methods became truly asynchronous.
> This situation is very frustrating: a different component internals were 
> changed with no API changes and a completely unrelated module is not longer 
> able to pass its tests. Though I fixed most of the failures, some tests are 
> still failing and I'm going to try to describe, what's wrong with them:
> *{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationAfterScaleUpTriggeredOnNewCluster}}*
>  - this test tests a scenario when we start a node after logical topology was 
> updated. I don't know how realistic is this scenario, but the problem is that 
> "data nodes" don't get populated with the logical topology nodes on 
> {{distributionZoneManager}} start, because {{scheduleTimers}} method, that 
> get's invoked from the Meta Storage Watch, doesn't go inside the {{if 
> (!addedNodes.isEmpty() && autoAdjustScaleUp != INFINITE_TIMER_VALUE)}} branch.
> *{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationForDefaultZoneAfterScaleUpTriggered}}*
>  - same issue as above.
> *{{DistributionZoneManagerScaleUpTest#testDataNodesPropagationForDefaultZoneAfterScaleDownTriggered}}*
>  - same issue as above.
> *{{DistributionZoneManagerScaleUpTest#testUpdateZoneScaleUpTriggersDataNodePropagation}}*
>  - this test fails with the following assertion error: {_}Expected revision 
> that is greater or equal to already seen meta storage events.{_}. This is 
> because TestConfigurationStorage does not use the same revision as the Meta 
> Storage, therefore their revisions can't be compared directly. This should 
> either be converted to an integration test or it should use 
> `DistributedConfigurationStrorage` instead.
> *{{DistributionZoneManagerScaleUpTest#testUpdateZoneScaleDownTriggersDataNodePropagation}}*
>  - same issue as above.
> *{{DistributionZoneManagerScaleUpTest#testDropZoneDoNotPropagateDataNodesAfterScaleUp}}*
>  - this test is flaky, because notifications from test configuration storage 
> and from Meta Storage Watches are not related to each other (unlike real-life 
> Distributed Configuration Storage which is built on top of Watches), so 
> notifications from the configuration storage and Meta Storage can arrive in a 
> undetermined order.
> *{{DistributionZoneManagerScaleUpTest#testDropZoneDoNotPropagateDataNodesAfterScaleDown}}*
>  - same issue as above.
> *{{DistributionZoneManagerWatchListenerTest#testDataNodesOfDefaultZoneUpdatedOnWatchListenerEvent}}*
>  - this test is flaky, probably due to some races between Watch and 
> Configuration Listener execution (sometimes a retry on {{invoke}} happens and 
> {{Mockito#verify}} fails).
>  
> *New tests* from 
> [https://github.com/gridgain/apache-ignite-3/tree/ignite-18756]
> *DistributionZoneAwaitDataNodesTest#testRemoveZoneWhileAwaitingDataNodes* - 
> this test must remove the zone after MetastorageTopologyListener updates the 
> topVerTracker and before 
> MetastorageDataNodesListener updates 
> scaleUpRevisionTracker/scaleDownRevisionTracker. Now it's impossible to do it 
> with StandaloneMetaStorageManager.
> *DistributionZoneAwaitDataNodesTest#testScaleUpScaleDownAreChangedWhileAwaitingDataNodes*
>  - same issue as above but here we need to update scaleUp and scaleDown 
> instead of removing the zone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to