Hello, everyone! Currently, property DataStorageConfiguration#maxWalArchiveSize is not working as expected by users. We can easily go beyond this limit and overflow the disk, which will lead to errors and a crash of the node. I propose to fix this behavior and not let WAL archive overflow.
It is suggested not to add segments to the archive if we can exceed the DataStorageConfiguration#maxWalArchiveSize and wait until space becomes available for this. Thus, we may have a deadlock: Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> need to clean WAL archive -> need to complete checkpoint (impossible because of checkpontReadLock taken). To avoid such situations, I suggest adding a custom heuristic - do not give a IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few (default 1) segments left. But this will not allow us to completely avoid archive overflow situations. Therefore, I suggest fail node by FH when a deadlock is detected, since it could be the same if there was no disk space left.
