What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design. It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :) In any case, I think we're on the same page.
> On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote: > > Stan > >> If archive size is less than min or more than max then the system >> functionality can degrade (e.g. historical rebalance may not work as >> expected). > > Why does the condition "archive size is less than min" lead to system > degradation? Actually, the described case is a normal situation for > brand new clusters. > > I'm okay with the proposed minWalArchiveSize property. Looks like > relatively understandable property. > > On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov > <stanlukya...@gmail.com> wrote: >> >> Discuss this with Kirill verbally. >> >> Kirill showed me that having the min threshold doesn't quite work. >> It doesn't work because we no longer know how much WAL we should remove if >> we reach getMaxWalArchiveSize. >> >> For example, say we have minWalArchiveTimespan=2 hours and >> maxWalArchiveSize=2GB. >> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space. >> Now, say we're doing historical rebalance and reserve the WAL archive. >> The WAL archive starts growing and soon it occupies 2 GB. >> Now what? >> We're supposed to give up WAL reservations and start agressively removing >> WAL archive. >> But it is not clear when can we stop removing WAL archive - since last 2 >> hours of WAL are larger than our maxWalArchiveSize >> there is no meaningful point the system can use as a "minimum" WAL size. >> >> I understand the description above is a bit messy but I believe that whoever >> is interested in this will understand it >> after drawing this on paper. >> >> >> I'm giving up on my latest suggestion about time-based minimum. Let's keep >> it simple. >> >> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the >> solution, >> with the behavior as initially described by Kirill. >> >> Stan >> >> >>> On 7 May 2021, at 15:09, ткаленко кирилл <tkalkir...@yandex.ru> wrote: >>> >>> Stas hello! >>> >>> I didn't quite get your last idea. >>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the >>> segment until minWalArchiveTimespan? >>> >>> 06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukya...@gmail.com>: >>>> An interesting suggestion I heard today. >>>> >>>> The minWalArchiveSize property might actually be minWalArchiveTimespan - >>>> i.e. be a number of seconds instead of a number of bytes! >>>> >>>> I think this makes perfect sense from the user point of view. >>>> "I want to have WAL archive for at least N hours but I have a limit of M >>>> gigabytes to store it". >>>> >>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?) >>>> Perhaps we can actually implement this? >>>> >>>> Thanks, >>>> Stan >>>> >>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukya...@gmail.com> >>>>> wrote: >>>>> >>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize >>>>> +1 to add a public property to replace >>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE >>>>> >>>>> I don't like the name getWalArchiveSize - I think it's a bit confusing >>>>> (is it the current size? the minimal size? the target size?) >>>>> I suggest to name the property geMintWalArchiveSize. I think that this is >>>>> exactly what it is - the minimal size of the archive that we want to have. >>>>> The archive size at all times should be between min and max. >>>>> If archive size is less than min or more than max then the system >>>>> functionality can degrade (e.g. historical rebalance may not work as >>>>> expected). >>>>> I think these rules are intuitively understood from the "min" and "max" >>>>> names. >>>>> >>>>> Ilya's suggestion about throttling is great although I'd do this in a >>>>> different ticket. >>>>> >>>>> Thanks, >>>>> Stan >>>>> >>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mmu...@apache.org> wrote: >>>>>> >>>>>> Hello, Kirill >>>>>> >>>>>> +1 for this change, however, there are too many configuration settings >>>>>> that exist for the user to configure Ignite cluster. It is better to >>>>>> keep the options that we already have and fix the behaviour of the >>>>>> rebalance process as you suggested. >>>>>> >>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> >>>>>> wrote: >>>>>>> Hi Ilya! >>>>>>> >>>>>>> Then we can greatly reduce the user load on the cluster until the >>>>>>> rebalance is over. Which can be critical for the user. >>>>>>> >>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>: >>>>>>>> Hello! >>>>>>>> >>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint >>>>>>>> based >>>>>>>> write throttling? >>>>>>>> >>>>>>>> So we will be throttling for both checkpoint page buffer and WAL limit. >>>>>>>> >>>>>>>> Regards, >>>>>>>> -- >>>>>>>> Ilya Kasnacheev >>>>>>>> >>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>: >>>>>>>> >>>>>>>>> Hello everybody! >>>>>>>>> >>>>>>>>> At the moment, if there are partitions for the rebalance for which the >>>>>>>>> historical rebalance will be used, then we reserve segments in the WAL >>>>>>>>> archive (we do not allow cleaning the WAL archive) until the >>>>>>>>> rebalance for >>>>>>>>> all cache groups is over. >>>>>>>>> >>>>>>>>> If a cluster is under load during the rebalance, WAL archive size may >>>>>>>>> significantly exceed limits set in >>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is >>>>>>>>> complete. This may lead to user issues and nodes may crash with the >>>>>>>>> "No >>>>>>>>> space left on device" error. >>>>>>>>> >>>>>>>>> We have a system property >>>>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by >>>>>>>>> default 0.5, which sets the threshold (multiplied by >>>>>>>>> getMaxWalArchiveSize) >>>>>>>>> from which and up to which the WAL archive will be cleared, i.e. sets >>>>>>>>> the >>>>>>>>> size of the WAL archive that will always be on the node. I propose to >>>>>>>>> replace this system property with the >>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is >>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now. >>>>>>>>> >>>>>>>>> Main proposal: >>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, >>>>>>>>> cancel >>>>>>>>> and do not give the reservation of the WAL segments until we reach >>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is >>>>>>>>> no >>>>>>>>> segment for historical rebalance, we will automatically switch to full >>>>>>>>> rebalance. >>