Hard limit WAL archive size

ткаленко кирилл Tue, 26 Jan 2021 07:48:56 -0800

Hello, everyone!

Currently, property DataStorageConfiguration#maxWalArchiveSize is not working 
as expected by users. We can easily go beyond this limit and overflow the disk, 
which will lead to errors and a crash of the node. I propose to fix this 
behavior and not let WAL archive overflow.


It is suggested not to add segments to the archive if we can exceed the 
DataStorageConfiguration#maxWalArchiveSize and wait until space becomes 
available for this.

Thus, we may have a deadlock:
Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> need 
to clean WAL archive -> need to complete checkpoint (impossible because of 
checkpontReadLock taken).

To avoid such situations, I suggest adding a custom heuristic - do not give a 
IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few (default 
1) segments left.
But this will not allow us to completely avoid archive overflow situations. 
Therefore, I suggest fail node by FH when a deadlock is detected, since it 
could be the same if there was no disk space left.

Hard limit WAL archive size

Reply via email to