Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Maxim Muzafarov Wed, 05 May 2021 09:25:40 -0700

Hello, Kirill

+1 for this change, however, there are too many configuration settings
that exist for the user to configure Ignite cluster. It is better to
keep the options that we already have and fix the behaviour of the
rebalance process as you suggested.


On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> wrote:
>
> Hi Ilya!
>
> Then we can greatly reduce the user load on the cluster until the rebalance 
> is over. Which can be critical for the user.
>
> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>:
> > Hello!
> >
> > Maybe we can have a mechanic here similar (or equal) to checkpoint based
> > write throttling?
> >
> > So we will be throttling for both checkpoint page buffer and WAL limit.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> > вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>:
> >
> >>  Hello everybody!
> >>
> >>  At the moment, if there are partitions for the rebalance for which the
> >>  historical rebalance will be used, then we reserve segments in the WAL
> >>  archive (we do not allow cleaning the WAL archive) until the rebalance for
> >>  all cache groups is over.
> >>
> >>  If a cluster is under load during the rebalance, WAL archive size may
> >>  significantly exceed limits set in
> >>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
> >>  complete. This may lead to user issues and nodes may crash with the "No
> >>  space left on device" error.
> >>
> >>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
> >>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
> >>  from which and up to which the WAL archive will be cleared, i.e. sets the
> >>  size of the WAL archive that will always be on the node. I propose to
> >>  replace this system property with the
> >>   DataStorageConfiguration#getWalArchiveSize in bytes, the default is
> >>  (getMaxWalArchiveSize * 0.5) as it is now.
> >>
> >>  Main proposal:
> >>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
> >>  and do not give the reservation of the WAL segments until we reach
> >>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
> >>  segment for historical rebalance, we will automatically switch to full
> >>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Reply via email to