Hello !
this is unclear for me, all you described near brings no info why node work 
improperly and why FH can possibly fail this node. Can you explain ?
 
>Hello, everyone!
>
>Currently, property DataStorageConfiguration#maxWalArchiveSize is not working 
>as expected by users. We can easily go beyond this limit and overflow the 
>disk, which will lead to errors and a crash of the node. I propose to fix this 
>behavior and not let WAL archive overflow.
>
>It is suggested not to add segments to the archive if we can exceed the 
>DataStorageConfiguration#maxWalArchiveSize and wait until space becomes 
>available for this.
>
>Thus, we may have a deadlock:
>Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> need 
>to clean WAL archive -> need to complete checkpoint (impossible because of 
>checkpontReadLock taken).
>
>To avoid such situations, I suggest adding a custom heuristic - do not give a 
>IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few (default 
>1) segments left.
>But this will not allow us to completely avoid archive overflow situations. 
>Therefore, I suggest fail node by FH when a deadlock is detected, since it 
>could be the same if there was no disk space left. 
 
 
 
 

Reply via email to