Hi Ilya!

Then we can greatly reduce the user load on the cluster until the rebalance is 
over. Which can be critical for the user.

04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>:
> Hello!
>
> Maybe we can have a mechanic here similar (or equal) to checkpoint based
> write throttling?
>
> So we will be throttling for both checkpoint page buffer and WAL limit.
>
> Regards,
> --
> Ilya Kasnacheev
>
> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>:
>
>>  Hello everybody!
>>
>>  At the moment, if there are partitions for the rebalance for which the
>>  historical rebalance will be used, then we reserve segments in the WAL
>>  archive (we do not allow cleaning the WAL archive) until the rebalance for
>>  all cache groups is over.
>>
>>  If a cluster is under load during the rebalance, WAL archive size may
>>  significantly exceed limits set in
>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>  complete. This may lead to user issues and nodes may crash with the "No
>>  space left on device" error.
>>
>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>  from which and up to which the WAL archive will be cleared, i.e. sets the
>>  size of the WAL archive that will always be on the node. I propose to
>>  replace this system property with the
>>   DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>
>>  Main proposal:
>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>  and do not give the reservation of the WAL segments until we reach
>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>  segment for historical rebalance, we will automatically switch to full
>>  rebalance.

Reply via email to