Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Ilya Kasnacheev Tue, 04 May 2021 08:43:18 -0700

Hello!

Maybe we can have a mechanic here similar (or equal) to checkpoint based
write throttling?


So we will be throttling for both checkpoint page buffer and WAL limit.

Regards,
-- 
Ilya Kasnacheev


вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>:

> Hello everybody!
>
> At the moment, if there are partitions for the rebalance for which the
> historical rebalance will be used, then we reserve segments in the WAL
> archive (we do not allow cleaning the WAL archive) until the rebalance for
> all cache groups is over.
>
> If a cluster is under load during the rebalance, WAL archive size may
> significantly exceed limits set in
> DataStorageConfiguration#getMaxWalArchiveSize until the process is
> complete. This may lead to user issues and nodes may crash with the "No
> space left on device" error.
>
> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
> from which and up to which the WAL archive will be cleared, i.e. sets the
> size of the WAL archive that will always be on the node. I propose to
> replace this system property with the
>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
> (getMaxWalArchiveSize * 0.5) as it is now.
>
> Main proposal:
> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
> and do not give the reservation of the WAL segments until we reach
> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
> segment for historical rebalance, we will automatically switch to full
> rebalance.
>

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Reply via email to