Created the first task by this discussion IGNITE-14923.
13.05.2021, 18:37, "Stanislav Lukyanov" <stanlukya...@gmail.com>: > What I mean by degradation when archive size < min is that, for example, > historical rebalance is available for a smaller timespan than expected by the > system design. > It may not be an issue of course, especially for a new cluster. If > "degradation" is the wrong word we can call it "non-steady state" :) > In any case, I think we're on the same page. > >> On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote: >> >> Stan >> >>> If archive size is less than min or more than max then the system >>> functionality can degrade (e.g. historical rebalance may not work as >>> expected). >> >> Why does the condition "archive size is less than min" lead to system >> degradation? Actually, the described case is a normal situation for >> brand new clusters. >> >> I'm okay with the proposed minWalArchiveSize property. Looks like >> relatively understandable property. >> >> On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov >> <stanlukya...@gmail.com> wrote: >>> Discuss this with Kirill verbally. >>> >>> Kirill showed me that having the min threshold doesn't quite work. >>> It doesn't work because we no longer know how much WAL we should remove if >>> we reach getMaxWalArchiveSize. >>> >>> For example, say we have minWalArchiveTimespan=2 hours and >>> maxWalArchiveSize=2GB. >>> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space. >>> Now, say we're doing historical rebalance and reserve the WAL archive. >>> The WAL archive starts growing and soon it occupies 2 GB. >>> Now what? >>> We're supposed to give up WAL reservations and start agressively removing >>> WAL archive. >>> But it is not clear when can we stop removing WAL archive - since last 2 >>> hours of WAL are larger than our maxWalArchiveSize >>> there is no meaningful point the system can use as a "minimum" WAL size. >>> >>> I understand the description above is a bit messy but I believe that >>> whoever is interested in this will understand it >>> after drawing this on paper. >>> >>> I'm giving up on my latest suggestion about time-based minimum. Let's keep >>> it simple. >>> >>> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the >>> solution, >>> with the behavior as initially described by Kirill. >>> >>> Stan >>> >>>> On 7 May 2021, at 15:09, ткаленко кирилл <tkalkir...@yandex.ru> wrote: >>>> >>>> Stas hello! >>>> >>>> I didn't quite get your last idea. >>>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the >>>> segment until minWalArchiveTimespan? >>>> >>>> 06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukya...@gmail.com>: >>>>> An interesting suggestion I heard today. >>>>> >>>>> The minWalArchiveSize property might actually be minWalArchiveTimespan - >>>>> i.e. be a number of seconds instead of a number of bytes! >>>>> >>>>> I think this makes perfect sense from the user point of view. >>>>> "I want to have WAL archive for at least N hours but I have a limit of M >>>>> gigabytes to store it". >>>>> >>>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?) >>>>> Perhaps we can actually implement this? >>>>> >>>>> Thanks, >>>>> Stan >>>>> >>>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukya...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize >>>>>> +1 to add a public property to replace >>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE >>>>>> >>>>>> I don't like the name getWalArchiveSize - I think it's a bit confusing >>>>>> (is it the current size? the minimal size? the target size?) >>>>>> I suggest to name the property geMintWalArchiveSize. I think that this >>>>>> is exactly what it is - the minimal size of the archive that we want to >>>>>> have. >>>>>> The archive size at all times should be between min and max. >>>>>> If archive size is less than min or more than max then the system >>>>>> functionality can degrade (e.g. historical rebalance may not work as >>>>>> expected). >>>>>> I think these rules are intuitively understood from the "min" and "max" >>>>>> names. >>>>>> >>>>>> Ilya's suggestion about throttling is great although I'd do this in a >>>>>> different ticket. >>>>>> >>>>>> Thanks, >>>>>> Stan >>>>>> >>>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mmu...@apache.org> wrote: >>>>>>> >>>>>>> Hello, Kirill >>>>>>> >>>>>>> +1 for this change, however, there are too many configuration settings >>>>>>> that exist for the user to configure Ignite cluster. It is better to >>>>>>> keep the options that we already have and fix the behaviour of the >>>>>>> rebalance process as you suggested. >>>>>>> >>>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> >>>>>>> wrote: >>>>>>>> Hi Ilya! >>>>>>>> >>>>>>>> Then we can greatly reduce the user load on the cluster until the >>>>>>>> rebalance is over. Which can be critical for the user. >>>>>>>> >>>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>: >>>>>>>>> Hello! >>>>>>>>> >>>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint >>>>>>>>> based >>>>>>>>> write throttling? >>>>>>>>> >>>>>>>>> So we will be throttling for both checkpoint page buffer and WAL >>>>>>>>> limit. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> -- >>>>>>>>> Ilya Kasnacheev >>>>>>>>> >>>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>: >>>>>>>>> >>>>>>>>>> Hello everybody! >>>>>>>>>> >>>>>>>>>> At the moment, if there are partitions for the rebalance for which >>>>>>>>>> the >>>>>>>>>> historical rebalance will be used, then we reserve segments in the >>>>>>>>>> WAL >>>>>>>>>> archive (we do not allow cleaning the WAL archive) until the >>>>>>>>>> rebalance for >>>>>>>>>> all cache groups is over. >>>>>>>>>> >>>>>>>>>> If a cluster is under load during the rebalance, WAL archive size >>>>>>>>>> may >>>>>>>>>> significantly exceed limits set in >>>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is >>>>>>>>>> complete. This may lead to user issues and nodes may crash with the >>>>>>>>>> "No >>>>>>>>>> space left on device" error. >>>>>>>>>> >>>>>>>>>> We have a system property >>>>>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by >>>>>>>>>> default 0.5, which sets the threshold (multiplied by >>>>>>>>>> getMaxWalArchiveSize) >>>>>>>>>> from which and up to which the WAL archive will be cleared, i.e. >>>>>>>>>> sets the >>>>>>>>>> size of the WAL archive that will always be on the node. I propose >>>>>>>>>> to >>>>>>>>>> replace this system property with the >>>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is >>>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now. >>>>>>>>>> >>>>>>>>>> Main proposal: >>>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, >>>>>>>>>> cancel >>>>>>>>>> and do not give the reservation of the WAL segments until we reach >>>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there >>>>>>>>>> is no >>>>>>>>>> segment for historical rebalance, we will automatically switch to >>>>>>>>>> full >>>>>>>>>> rebalance.