[ https://issues.apache.org/jira/browse/IGNITE-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maksim Timonin updated IGNITE-21135: ------------------------------------ Description: It should be guaranteed that `CdcManager` handle all WAL records in the same order as they has been written to WAL. On Ignite start `CdcManager` handles restored WAL records with `CdcManager#afterBinaryMemoryRestore`. WALRecords written in runtime are handled with `CdcManager#collect` method. Then for the guarantee it's required that `#afterBinaryMemoryRestore` must be finished before any new WALRecords written. Now, this guarantee is broken due to the `MemoryRecoveryRecord` is written before `CdcManager#afterBinaryMemoryRestore` finishes. Scenario: Ignite node starts, it restores binary memory in `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method: # `FileWriteAheadLogManager` resumes logging and starts the `wal-segment-syncer` thread. # `MemoryRecoveryRecord` is written. # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called. # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, then `CdcManager#afterBinaryMemoryRestore` method is invoked. # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method. # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records after that. To fix this issue: # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the `MemoryRecoveryRecord`. # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the `#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager` (see how it's done for in-memory caches in `IgniteCacheDatabaseSharedManager#startMemoryRestore`). was: It should be guaranteed that `CdcManager` handle all WAL records in the same order as they has been written to WAL. Now, this guarantee is broken due to contention bertween methods `#afterBinaryMemoryRestore` and `#collect`. Scenario of the contention, when `MemoryRecoveryRecord` is handled before prior records: Ignite node starts, it restores binary memory in `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method: # `FileWriteAheadLogManager` resumes logging and starts the `wal-segment-syncer` thread. # `MemoryRecoveryRecord` is written. # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called. # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, then `CdcManager#afterBinaryMemoryRestore` method is invoked. # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method. # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records after that. To fix this issue: # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the `MemoryRecoveryRecord`. # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the `#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager` (see how it's done for in-memory caches in `IgniteCacheDatabaseSharedManager#startMemoryRestore`). > CdcManager might collect WAL data before restoring it's state > ------------------------------------------------------------- > > Key: IGNITE-21135 > URL: https://issues.apache.org/jira/browse/IGNITE-21135 > Project: Ignite > Issue Type: Bug > Reporter: Maksim Timonin > Assignee: Maksim Timonin > Priority: Major > Labels: ise > Fix For: 2.17 > > Time Spent: 10m > Remaining Estimate: 0h > > It should be guaranteed that `CdcManager` handle all WAL records in the same > order as they has been written to WAL. > On Ignite start `CdcManager` handles restored WAL records with > `CdcManager#afterBinaryMemoryRestore`. WALRecords written in runtime are > handled with `CdcManager#collect` method. Then for the guarantee it's > required that `#afterBinaryMemoryRestore` must be finished before any new > WALRecords written. > Now, this guarantee is broken due to the `MemoryRecoveryRecord` is written > before `CdcManager#afterBinaryMemoryRestore` finishes. > Scenario: > Ignite node starts, it restores binary memory in > `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method: > # `FileWriteAheadLogManager` resumes logging and starts the > `wal-segment-syncer` thread. > # `MemoryRecoveryRecord` is written. > # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called. > # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, > then `CdcManager#afterBinaryMemoryRestore` method is invoked. > # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method. > # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records > after that. > To fix this issue: > # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the > `MemoryRecoveryRecord`. > # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and > the `#afterBinaryMemoryRestore` method must be invoked directly on the > `CdcManager` (see how it's done for in-memory caches in > `IgniteCacheDatabaseSharedManager#startMemoryRestore`). > -- This message was sent by Atlassian Jira (v8.20.10#820010)