[ https://issues.apache.org/jira/browse/IGNITE-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maksim Timonin updated IGNITE-21135: ------------------------------------ Description: `CdcManager` must guarantee that its method `#afterBinaryMemoryRestore` must finish before the first call of `#collect`. But actually there is a contention between these methods. Scenario of the contention: Ignite node starts, it restores binary memory in `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method: # `FileWriteAheadLogManager` resumes logging and starts the `wal-segment-syncer` thread. # `MemoryRecoveryRecord` is written. # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called. # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, then `CdcManager#afterBinaryMemoryRestore` method is invoked. # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method. # `CdcManager#afterBinaryMemoryRestore` finishes after that. To fix this issue: # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the `MemoryRecoveryRecord`. # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the `#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager` (see how it's done for in-memory caches in `IgniteCacheDatabaseSharedManager#startMemoryRestore). # The interface can be simplified, only WALPointer should be specified. was: `CdcManager` must guarantee that `afterBinaryMemoryRestore()` finishes before the first call of `collect()`. Scenario of the contention: # Ignite node starts, restores binary memory in GridCacheDatabaseSharedManager#restoreBinaryMemory. # FileWriteAheadLogManager resumes logging and starts the "wal-segment-syncer" thread. # MemoryRecoveryRecord is written # CdcManager#afterBinaryMemoryRestore method is invoked in the Ignite start thread. # Note, the `collect()` method is called from the background system thread `wal-segment-syncer`. In `CdcManager` there is a contention between `collect()` and `afterBinaryMemoryRestore()` due to Ignite writes the `MemoryRecoveryRecord`. afterBinaryMemoryRestore should be invoked BEFORE any collect(). > CdcManager might collect WAL data before restoring it's state > ------------------------------------------------------------- > > Key: IGNITE-21135 > URL: https://issues.apache.org/jira/browse/IGNITE-21135 > Project: Ignite > Issue Type: Bug > Reporter: Maksim Timonin > Assignee: Maksim Timonin > Priority: Major > Fix For: 2.17 > > Time Spent: 10m > Remaining Estimate: 0h > > `CdcManager` must guarantee that its method `#afterBinaryMemoryRestore` must > finish before the first call of `#collect`. > But actually there is a contention between these methods. Scenario of the > contention: > Ignite node starts, it restores binary memory in > `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method: > # `FileWriteAheadLogManager` resumes logging and starts the > `wal-segment-syncer` thread. > # `MemoryRecoveryRecord` is written. > # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called. > # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, > then `CdcManager#afterBinaryMemoryRestore` method is invoked. > # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method. > # `CdcManager#afterBinaryMemoryRestore` finishes after that. > To fix this issue: > # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the > `MemoryRecoveryRecord`. > # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and > the `#afterBinaryMemoryRestore` method must be invoked directly on the > `CdcManager` (see how it's done for in-memory caches in > `IgniteCacheDatabaseSharedManager#startMemoryRestore). > # The interface can be simplified, only WALPointer should be specified. > -- This message was sent by Atlassian Jira (v8.20.10#820010)