Re: Hard limit WAL archive size
Hi! No, we basically have a problem with the growth of WAL archive. 26.01.2021, 19:06, "Vishwas Bm" : > Hi, > > Is this related to issue seen with > IGNITE-13912 ? > > I had hit IGNITE-13912 when I was using ignite 2.9 release. > I am yet to try my use case with the fix provided as part of IGNITE-13912 > > Regards, > Vishwas > > On Tue, 26 Jan, 2021, 21:18 ткаленко кирилл, wrote: > >> Hello, everyone! >> >> Currently, property DataStorageConfiguration#maxWalArchiveSize is not >> working as expected by users. We can easily go beyond this limit and >> overflow the disk, which will lead to errors and a crash of the node. I >> propose to fix this behavior and not let WAL archive overflow. >> >> It is suggested not to add segments to the archive if we can exceed the >> DataStorageConfiguration#maxWalArchiveSize and wait until space becomes >> available for this. >> >> Thus, we may have a deadlock: >> Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> >> need to clean WAL archive -> need to complete checkpoint (impossible >> because of checkpontReadLock taken). >> >> To avoid such situations, I suggest adding a custom heuristic - do not >> give a IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few >> (default 1) segments left. >> But this will not allow us to completely avoid archive overflow >> situations. Therefore, I suggest fail node by FH when a deadlock is >> detected, since it could be the same if there was no disk space left.
Re: Hard limit WAL archive size
As for me, correct approach is to trigger checkpoint when we are too close to WAL archive size limit. The main purpose of these mechanism is to provide durability, so we should think about not to fail node, nor to delete data voluntary, but prevent possible data loss. вт, 26 янв. 2021 г. в 19:13, Zhenya Stanilovsky : > > > Hello ! > this is unclear for me, all you described near brings no info why node > work improperly and why FH can possibly fail this node. Can you explain ? > > >Hello, everyone! > > > >Currently, property DataStorageConfiguration#maxWalArchiveSize is not > working as expected by users. We can easily go beyond this limit and > overflow the disk, which will lead to errors and a crash of the node. I > propose to fix this behavior and not let WAL archive overflow. > > > >It is suggested not to add segments to the archive if we can exceed the > DataStorageConfiguration#maxWalArchiveSize and wait until space becomes > available for this. > > > >Thus, we may have a deadlock: > >Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> > need to clean WAL archive -> need to complete checkpoint (impossible > because of checkpontReadLock taken). > > > >To avoid such situations, I suggest adding a custom heuristic - do not > give a IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few > (default 1) segments left. > >But this will not allow us to completely avoid archive overflow > situations. Therefore, I suggest fail node by FH when a deadlock is > detected, since it could be the same if there was no disk space left. > > > > -- Sincerely yours, Ivan Daschinskiy
Re: Hard limit WAL archive size
Hello ! this is unclear for me, all you described near brings no info why node work improperly and why FH can possibly fail this node. Can you explain ? >Hello, everyone! > >Currently, property DataStorageConfiguration#maxWalArchiveSize is not working >as expected by users. We can easily go beyond this limit and overflow the >disk, which will lead to errors and a crash of the node. I propose to fix this >behavior and not let WAL archive overflow. > >It is suggested not to add segments to the archive if we can exceed the >DataStorageConfiguration#maxWalArchiveSize and wait until space becomes >available for this. > >Thus, we may have a deadlock: >Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> need >to clean WAL archive -> need to complete checkpoint (impossible because of >checkpontReadLock taken). > >To avoid such situations, I suggest adding a custom heuristic - do not give a >IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few (default >1) segments left. >But this will not allow us to completely avoid archive overflow situations. >Therefore, I suggest fail node by FH when a deadlock is detected, since it >could be the same if there was no disk space left.
Re: Hard limit WAL archive size
Hi, Is this related to issue seen with IGNITE-13912 ? I had hit IGNITE-13912 when I was using ignite 2.9 release. I am yet to try my use case with the fix provided as part of IGNITE-13912 Regards, Vishwas On Tue, 26 Jan, 2021, 21:18 ткаленко кирилл, wrote: > Hello, everyone! > > Currently, property DataStorageConfiguration#maxWalArchiveSize is not > working as expected by users. We can easily go beyond this limit and > overflow the disk, which will lead to errors and a crash of the node. I > propose to fix this behavior and not let WAL archive overflow. > > It is suggested not to add segments to the archive if we can exceed the > DataStorageConfiguration#maxWalArchiveSize and wait until space becomes > available for this. > > Thus, we may have a deadlock: > Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> > need to clean WAL archive -> need to complete checkpoint (impossible > because of checkpontReadLock taken). > > To avoid such situations, I suggest adding a custom heuristic - do not > give a IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few > (default 1) segments left. > But this will not allow us to completely avoid archive overflow > situations. Therefore, I suggest fail node by FH when a deadlock is > detected, since it could be the same if there was no disk space left. >
Hard limit WAL archive size
Hello, everyone! Currently, property DataStorageConfiguration#maxWalArchiveSize is not working as expected by users. We can easily go beyond this limit and overflow the disk, which will lead to errors and a crash of the node. I propose to fix this behavior and not let WAL archive overflow. It is suggested not to add segments to the archive if we can exceed the DataStorageConfiguration#maxWalArchiveSize and wait until space becomes available for this. Thus, we may have a deadlock: Get checkpontReadLock -> write to WAL -> need to rollover WAL segment -> need to clean WAL archive -> need to complete checkpoint (impossible because of checkpontReadLock taken). To avoid such situations, I suggest adding a custom heuristic - do not give a IgniteCacheDatabaseSharedManager#checkpointReadLock if there are few (default 1) segments left. But this will not allow us to completely avoid archive overflow situations. Therefore, I suggest fail node by FH when a deadlock is detected, since it could be the same if there was no disk space left.