Created the second task by this discussion, IGNITE-14952.
17.06.2021, 14:26, "ткаленко кирилл" <tkalkir...@yandex.ru>: > Created the first task by this discussion IGNITE-14923. > > 13.05.2021, 18:37, "Stanislav Lukyanov" <stanlukya...@gmail.com>: >> What I mean by degradation when archive size < min is that, for example, >> historical rebalance is available for a smaller timespan than expected by >> the system design. >> It may not be an issue of course, especially for a new cluster. If >> "degradation" is the wrong word we can call it "non-steady state" :) >> In any case, I think we're on the same page. >> >>> On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote: >>> >>> Stan >>> >>>> If archive size is less than min or more than max then the system >>>> functionality can degrade (e.g. historical rebalance may not work as >>>> expected). >>> >>> Why does the condition "archive size is less than min" lead to system >>> degradation? Actually, the described case is a normal situation for >>> brand new clusters. >>> >>> I'm okay with the proposed minWalArchiveSize property. Looks like >>> relatively understandable property. >>> >>> On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov >>> <stanlukya...@gmail.com> wrote: >>>> Discuss this with Kirill verbally. >>>> >>>> Kirill showed me that having the min threshold doesn't quite work. >>>> It doesn't work because we no longer know how much WAL we should remove >>>> if we reach getMaxWalArchiveSize. >>>> >>>> For example, say we have minWalArchiveTimespan=2 hours and >>>> maxWalArchiveSize=2GB. >>>> Say, under normal load on stable topology 2 hours of WAL use 1 GB of >>>> space. >>>> Now, say we're doing historical rebalance and reserve the WAL archive. >>>> The WAL archive starts growing and soon it occupies 2 GB. >>>> Now what? >>>> We're supposed to give up WAL reservations and start agressively >>>> removing WAL archive. >>>> But it is not clear when can we stop removing WAL archive - since last 2 >>>> hours of WAL are larger than our maxWalArchiveSize >>>> there is no meaningful point the system can use as a "minimum" WAL size. >>>> >>>> I understand the description above is a bit messy but I believe that >>>> whoever is interested in this will understand it >>>> after drawing this on paper. >>>> >>>> I'm giving up on my latest suggestion about time-based minimum. Let's >>>> keep it simple. >>>> >>>> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the >>>> solution, >>>> with the behavior as initially described by Kirill. >>>> >>>> Stan >>>> >>>>> On 7 May 2021, at 15:09, ткаленко кирилл <tkalkir...@yandex.ru> wrote: >>>>> >>>>> Stas hello! >>>>> >>>>> I didn't quite get your last idea. >>>>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete >>>>> the segment until minWalArchiveTimespan? >>>>> >>>>> 06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukya...@gmail.com>: >>>>>> An interesting suggestion I heard today. >>>>>> >>>>>> The minWalArchiveSize property might actually be minWalArchiveTimespan >>>>>> - i.e. be a number of seconds instead of a number of bytes! >>>>>> >>>>>> I think this makes perfect sense from the user point of view. >>>>>> "I want to have WAL archive for at least N hours but I have a limit of >>>>>> M gigabytes to store it". >>>>>> >>>>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?) >>>>>> Perhaps we can actually implement this? >>>>>> >>>>>> Thanks, >>>>>> Stan >>>>>> >>>>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukya...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize >>>>>>> +1 to add a public property to replace >>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE >>>>>>> >>>>>>> I don't like the name getWalArchiveSize - I think it's a bit >>>>>>> confusing (is it the current size? the minimal size? the target size?) >>>>>>> I suggest to name the property geMintWalArchiveSize. I think that >>>>>>> this is exactly what it is - the minimal size of the archive that we >>>>>>> want to have. >>>>>>> The archive size at all times should be between min and max. >>>>>>> If archive size is less than min or more than max then the system >>>>>>> functionality can degrade (e.g. historical rebalance may not work as >>>>>>> expected). >>>>>>> I think these rules are intuitively understood from the "min" and >>>>>>> "max" names. >>>>>>> >>>>>>> Ilya's suggestion about throttling is great although I'd do this in a >>>>>>> different ticket. >>>>>>> >>>>>>> Thanks, >>>>>>> Stan >>>>>>> >>>>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mmu...@apache.org> wrote: >>>>>>>> >>>>>>>> Hello, Kirill >>>>>>>> >>>>>>>> +1 for this change, however, there are too many configuration >>>>>>>> settings >>>>>>>> that exist for the user to configure Ignite cluster. It is better to >>>>>>>> keep the options that we already have and fix the behaviour of the >>>>>>>> rebalance process as you suggested. >>>>>>>> >>>>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> >>>>>>>> wrote: >>>>>>>>> Hi Ilya! >>>>>>>>> >>>>>>>>> Then we can greatly reduce the user load on the cluster until the >>>>>>>>> rebalance is over. Which can be critical for the user. >>>>>>>>> >>>>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>: >>>>>>>>>> Hello! >>>>>>>>>> >>>>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint >>>>>>>>>> based >>>>>>>>>> write throttling? >>>>>>>>>> >>>>>>>>>> So we will be throttling for both checkpoint page buffer and WAL >>>>>>>>>> limit. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> -- >>>>>>>>>> Ilya Kasnacheev >>>>>>>>>> >>>>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>: >>>>>>>>>> >>>>>>>>>>> Hello everybody! >>>>>>>>>>> >>>>>>>>>>> At the moment, if there are partitions for the rebalance for >>>>>>>>>>> which the >>>>>>>>>>> historical rebalance will be used, then we reserve segments in >>>>>>>>>>> the WAL >>>>>>>>>>> archive (we do not allow cleaning the WAL archive) until the >>>>>>>>>>> rebalance for >>>>>>>>>>> all cache groups is over. >>>>>>>>>>> >>>>>>>>>>> If a cluster is under load during the rebalance, WAL archive size >>>>>>>>>>> may >>>>>>>>>>> significantly exceed limits set in >>>>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is >>>>>>>>>>> complete. This may lead to user issues and nodes may crash with >>>>>>>>>>> the "No >>>>>>>>>>> space left on device" error. >>>>>>>>>>> >>>>>>>>>>> We have a system property >>>>>>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by >>>>>>>>>>> default 0.5, which sets the threshold (multiplied by >>>>>>>>>>> getMaxWalArchiveSize) >>>>>>>>>>> from which and up to which the WAL archive will be cleared, i.e. >>>>>>>>>>> sets the >>>>>>>>>>> size of the WAL archive that will always be on the node. I >>>>>>>>>>> propose to >>>>>>>>>>> replace this system property with the >>>>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default >>>>>>>>>>> is >>>>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now. >>>>>>>>>>> >>>>>>>>>>> Main proposal: >>>>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, >>>>>>>>>>> cancel >>>>>>>>>>> and do not give the reservation of the WAL segments until we reach >>>>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if >>>>>>>>>>> there is no >>>>>>>>>>> segment for historical rebalance, we will automatically switch to >>>>>>>>>>> full >>>>>>>>>>> rebalance.