[ https://issues.apache.org/jira/browse/IGNITE-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Bessonov resolved IGNITE-16305. ------------------------------------ Resolution: Duplicate > [POC] In-Memory storage integration > ----------------------------------- > > Key: IGNITE-16305 > URL: https://issues.apache.org/jira/browse/IGNITE-16305 > Project: Ignite > Issue Type: Task > Components: persistence > Affects Versions: 3.0.0-alpha3 > Reporter: Ivan Bessonov > Priority: Major > Labels: iep-74, ignite-3 > > Goals > We need an in-memory store, similar to Ignite-2. This store must reuse common > replication infrastructure, in other words, be integrated into raft STM and > support transactions. > The raft protocol implies some persistent state: metadata, logs, snapshot. > Simplest solution - write a raft persistent state on disk (this is already > implemented for > org.apache.ignite.internal.storage.basic.ConcurrentHashMapPartitionStorage). > Drawback - not fully in-memory solution, doesn't much differ from a database > cache > We can go the pure in-memory way - keep all raft state in a volatile store. > h3. Raft metadata > Must not be persisted for a pure in-memory cluster, because the state is > always lost on restart. > Note: a node must always be removed from the raft group when it’s removed > from baseline by auto adjust and should join as new (in-memory always works > with auto-adjust similarly to Ignite 2). *Out of scope.* > h3. Log store > Has working in-memory implementation (currently used in tests): > org.apache.ignite.raft.jraft.storage.impl.LocalLogStorage > Note: generally speaking, log is only required for "historical rebalancing" > after the snapshot rebalance. It won't be needed at all once it is possible > to apply snapshot and concurrent updates at the same time, for example when a > solution like mvcc is implemented. > h3. Snapshots > Can be implemented over any kv store extended with some kind of Copy-On-Write > support. Not implemented currently. More details below. > h3. COW buffer > To create an in-memory snapshot, the snapshot data is written to a separate > in-memory buffer. The buffer is populated from the state machine update > thread either by the update operations or by a snapshot advance mini-task > which is submitted to the state machine update thread as needed. > To maintain a snapshot, the state machine needs to keep an snapshot iterator > boundary key. If a key being updated is smaller or equal than the boundary > key, there is no need in any additional action because the snapshot iterator > has already processed this key. If a key being updated is larger than the > boundary key, the old version of the key is eagerly put to the snapshot > buffer and the key is marked with snapshot ID (so that the key is skipped > during further iteration). Snapshot advance mini-task iterates over a next > batch of the keys starting from the boundary key and puts to the snapshot > buffer only keys that are not yet marked by the snapshot ID. > This approach has similar memory requirements to the first alternative, but > does not require to modify the storage tree so that it can store multiple > versions of the same key. This approach, however, allows for transparent > snapshot buffer offloading to disk which can reduce memory requirements. It > is also simpler in implementation because the code is essentially > single-threaded and only requires synchronization for the in-memory buffer. > The downside is that snapshot advance tasks will increase tail latency of > state machine update operations. > Can be implemented on top of any kv store. > Note: we should consider the possibility of streaming the snapshot instead of > storing it in memory until it is completed. -- This message was sent by Atlassian Jira (v8.20.1#820001)