[jira] [Commented] (IGNITE-21194) StorageException in ItIgniteNodeRestartTest#destroyObsoleteStoragesOnRestart

Kirill Gusakov (Jira) Fri, 12 Jan 2024 07:53:04 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-21194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806109#comment-17806109
 ]


Kirill Gusakov commented on IGNITE-21194:
-----------------------------------------

Some more details:
 * Two changes of stable assignments to the same value is really produced
 * As mentioned the second one produces by the 
RebalanceRaftGroupEventsListener.onLeaderElected
 * But the tricky point is the fact, that the onLeaderElected will not produce 
the new rebalance if the previous leader write stable+pending pair before die. 
Because it checks that the pending assignments is not null and run new 
rebalance only on this case.

So, it looks like the only possible case, when the described behaviour 
available:
 * leader1 finish the rebalance and fire the metastore write call (about the 
pendings clean and stable update) and die immedately due to node stop process 
start. But - the metastore call is still in-flight
 * new leader2 elected and see that the pendings is not empty. So, it start the 
rebalance from pending to stable, but raft group is  already on this 
configuration - rebalance done immediately and want to push stable updates.
 * at this moment in-flight stable+pendings metastore update is applied (first 
notification about the stable update triggered)
 * leader2 push the update to pendings+stable from himself (second notification 
about the stable update triggered)

> StorageException in ItIgniteNodeRestartTest#destroyObsoleteStoragesOnRestart
> ----------------------------------------------------------------------------
>
>                 Key: IGNITE-21194
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21194
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>         Attachments: full.log
>
>
> Test passes successully, but there are exceptions in logs.
> The scenario of this test includes altering the distribution zone. But the 
> subsequent notification about stable assignments at the end of rebalance 
> happens 2 times on the same node, with the same assignments. As a result, 
> redundant partitions are stopped and the storages are deleted on the first 
> event handling, and they are not found on the second one, which causes 
> exceptions.
> Seems that the second stable assignments change is triggered by the rebalance 
> raft configuration listener ( 
> RebalanceRaftGroupEventsListener#doOnNewPeersConfigurationApplied ) which is 
> triggered on the configuration changed by the new leader election:
> {code:java}
> [2024-01-05T19:18:36,891][INFO 
> ][%iinrt_dosor_1%rebalance-scheduler-0][RebalanceRaftGroupEventsListener] New 
> leader elected. Going to apply new configuration [tablePartitionId=6_part_0, 
> peers=[iinrt_dosor_1], learners=[]]{code}
> Probably we should check that the new set of peers differs from the others to 
> make some rebalance related updates to meta storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-21194) StorageException in ItIgniteNodeRestartTest#destroyObsoleteStoragesOnRestart

Reply via email to