[ 
https://issues.apache.org/jira/browse/IGNITE-16306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev reassigned IGNITE-16306:
--------------------------------------------

    Assignee: Aleksandr Polovtcev

> [POC] In-Memory storage integration
> -----------------------------------
>
>                 Key: IGNITE-16306
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16306
>             Project: Ignite
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Ivan Bessonov
>            Assignee: Aleksandr Polovtcev
>            Priority: Major
>              Labels: iep-74, ignite-3
>
> h3. Goals
> We need an in-memory store, similar to Ignite-2. This store must reuse common 
> replication infrastructure, in other words, be integrated into raft STM and 
> support transactions.
> The raft protocol implies some persistent state: metadata, logs, snapshot.
> Simplest solution - write a raft persistent state on disk (this is already 
> implemented for 
> org.apache.ignite.internal.storage.basic.ConcurrentHashMapPartitionStorage). 
> Drawback - not fully in-memory solution, doesn't much differ from a database 
> cache
> We can go the pure in-memory way - keep all raft state in a volatile store.
> h3. Raft metadata
> Must not be persisted for a pure in-memory cluster, because the state is 
> always lost on restart. 
> Note: a node must always be removed from the raft group when it’s removed 
> from baseline by auto adjust and should join as new (in-memory always works 
> with auto-adjust similarly to Ignite 2). *Out of scope.*
> h3. Log store
> Has working in-memory implementation (currently used in tests): 
> org.apache.ignite.raft.jraft.storage.impl.LocalLogStorage
> Note: generally speaking, log is only required for "historical rebalancing" 
> after the snapshot rebalance. It won't be needed at all once it is possible 
> to apply snapshot and concurrent updates at the same time, for example when a 
> solution like mvcc is implemented.
> h3. Snapshots
> Can be implemented over any kv store extended with some kind of Copy-On-Write 
> support. Not implemented currently. More details below.
> h3. COW buffer
> To create an in-memory snapshot, the snapshot data is written to a separate 
> in-memory buffer. The buffer is populated from the state machine update 
> thread either by the update operations or by a snapshot advance mini-task 
> which is submitted to the state machine update thread as needed.
> To maintain a snapshot, the state machine needs to keep an snapshot iterator 
> boundary key. If a key being updated is smaller or equal than the boundary 
> key, there is no need in any additional action because the snapshot iterator 
> has already processed this key. If a key being updated is larger than the 
> boundary key, the old version of the key is eagerly put to the snapshot 
> buffer and the key is marked with snapshot ID (so that the key is skipped 
> during further iteration). Snapshot advance mini-task iterates over a next 
> batch of the keys starting from the boundary key and puts to the snapshot 
> buffer only keys that are not yet marked by the snapshot ID.
> This approach has similar memory requirements to the first alternative, but 
> does not require to modify the storage tree so that it can store multiple 
> versions of the same key. This approach, however, allows for transparent 
> snapshot buffer offloading to disk which can reduce memory requirements. It 
> is also simpler in implementation because the code is essentially 
> single-threaded and only requires synchronization for the in-memory buffer. 
> The downside is that snapshot advance tasks will increase tail latency of 
> state machine update operations.
> Can be implemented on top of any kv store.
> Note: we should consider the possibility of streaming the snapshot instead of 
> storing it in memory until it is completed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to