[ https://issues.apache.org/jira/browse/IGNITE-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philipp Shergalis reassigned IGNITE-21588: ------------------------------------------ Assignee: Kirill Tkalenko (was: Philipp Shergalis) > CMG commands idempotency is broken > ---------------------------------- > > Key: IGNITE-21588 > URL: https://issues.apache.org/jira/browse/IGNITE-21588 > Project: Ignite > Issue Type: Bug > Reporter: Ivan Bessonov > Assignee: Kirill Tkalenko > Priority: Major > Labels: ignite-3 > > When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we > do the following: > * Read local state with {{{}readLogicalTopology(){}}}. > * Modify state according to the command. > * {*}Increase version{*}. > * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}. > The problem lies in reading and writing of the state - it' local, and version > value is not replicated. > What happens when we restart the node: > * It starts without local storage snapshot, with appliedIndex == 0, which is > a {*}state in the past{*}. > * We apply commands that were already applied before restart. > * We apply these commands to locally saved topology snapshot. > * This logical topology snapshot has a *state in the future* when compared > to appliedIndex == 0. > * As a result, when we re-apply some commands, we *increase the version* one > more time, thus breaking data consistency between nodes. > This would have been fine if we only used this version locally. But > distribution zones rely on the consistency of the version between all nodes > in cluster. This might break DZ data nodes handling if any of the cluster > nodes restarts. > How to fix: > * Either drop the storage if there's no storage snapshot, this will restore > consistency > * or never start CMG group from a snapshot, but rather start it from the > latest storage data. -- This message was sent by Atlassian Jira (v8.20.10#820010)