[ https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Lapin updated IGNITE-22819: ------------------------------------- Epic Link: IGNITE-21389 > Metastorage revisions inconsistency > ----------------------------------- > > Key: IGNITE-22819 > URL: https://issues.apache.org/jira/browse/IGNITE-22819 > Project: Ignite > Issue Type: Bug > Reporter: Ivan Bessonov > Assignee: Alexander Lapin > Priority: Major > Labels: ignite-3 > Time Spent: 0.5h > Remaining Estimate: 0h > > Following situation might happen: > {code:java} > [2024-07-24T09:29:17,220][INFO > ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] > Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp > [physical=2024-07-24 09:29:17:220 +0300, logical=0, > composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp > [physical=2024-07-24 09:29:17:220 +0300, logical=1, > composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. > ... > [2024-07-24T09:29:17,257][INFO > ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] > Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp > [physical=2024-07-24 09:29:17:257 +0300, logical=0, > composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp > [physical=2024-07-24 09:29:17:257 +0300, logical=1, > composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. > [2024-07-24T09:29:17,257][INFO > ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] > Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp > [physical=2024-07-24 09:29:17:257 +0300, logical=0, > composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp > [physical=2024-07-24 09:29:17:257 +0300, logical=1, > composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} > Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, > because of the difference of their clocks. {{evictIdempotentCommandsCache}} > works differently on different nodes for the same raft commands. > The real problem here is that it might (or might not) call the > {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would > increase local revision. > > Revision is always local, it's never replicated. Revision mismatch leads to > different evaluation of conditions in conditional updates and invokes. Simple > example of such an issue would be a skipped configuration update on one or > several nodes in cluster. > > What can we do about it: > * make an alternative for {{removeAll}} that doesn't increase local revision > * call {{removeAll}} even if the list is empty > * never invalidate cache locally, but rather replicate cache invalidation > with a special command > * there's a TODO that says "clear this during compaction". That's a bad > option, it would lead to either frequent compactions, or huge memory overheads -- This message was sent by Atlassian Jira (v8.20.10#820010)