[ 
https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-22819:
-----------------------------------------
    Fix Version/s: 3.0.0-beta2

> Metastorage revisions inconsistency
> -----------------------------------
>
>                 Key: IGNITE-22819
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22819
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Bessonov
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Following situation might happen:
> {code:java}
> [2024-07-24T09:29:17,220][INFO 
> ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
>  Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
> [physical=2024-07-24 09:29:17:220 +0300, logical=0, 
> composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
> [physical=2024-07-24 09:29:17:220 +0300, logical=1, 
> composite=112840052389969921], removedEntriesCount=0, cacheSize=240].
> ... 
> [2024-07-24T09:29:17,257][INFO 
> ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
>  Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
> [physical=2024-07-24 09:29:17:257 +0300, logical=0, 
> composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
> [physical=2024-07-24 09:29:17:257 +0300, logical=1, 
> composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
> [2024-07-24T09:29:17,257][INFO 
> ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
>  Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
> [physical=2024-07-24 09:29:17:257 +0300, logical=0, 
> composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
> [physical=2024-07-24 09:29:17:257 +0300, logical=1, 
> composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
> Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, 
> because of the difference of their clocks. {{evictIdempotentCommandsCache}} 
> works differently on different nodes for the same raft commands.
> The real problem here is that it might (or might not) call the 
> {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would 
> increase local revision.
>  
> Revision is always local, it's never replicated. Revision mismatch leads to 
> different evaluation of conditions in conditional updates and invokes. Simple 
> example of such an issue would be a skipped configuration update on one or 
> several nodes in cluster.
>  
>  What can we do about it:
>  * make an alternative for {{removeAll}} that doesn't increase local revision
>  * call {{removeAll}} even if the list is empty
>  * never invalidate cache locally, but rather replicate cache invalidation 
> with a special command
>  * there's a TODO that says "clear this during compaction". That's a bad 
> option, it would lead to either frequent compactions, or huge memory overheads



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to