Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Stanislav Lukyanov Thu, 13 May 2021 08:37:55 -0700

What I mean by degradation when archive size < min is that, for example, 
historical rebalance is available for a smaller timespan than expected by the 
system design.
It may not be an issue of course, especially for a new cluster. If 
"degradation" is the wrong word we can call it "non-steady state" :) 
In any case, I think we're on the same page.



> On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
> 
> Stan
> 
>> If archive size is less than min or more than max then the system 
>> functionality can degrade (e.g. historical rebalance may not work as 
>> expected).
> 
> Why does the condition "archive size is less than min" lead to system
> degradation? Actually, the described case is a normal situation for
> brand new clusters.
> 
> I'm okay with the proposed minWalArchiveSize property. Looks like
> relatively understandable property.
> 
> On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
> <stanlukya...@gmail.com> wrote:
>> 
>> Discuss this with Kirill verbally.
>> 
>> Kirill showed me that having the min threshold doesn't quite work.
>> It doesn't work because we no longer know how much WAL we should remove if 
>> we reach getMaxWalArchiveSize.
>> 
>> For example, say we have minWalArchiveTimespan=2 hours and 
>> maxWalArchiveSize=2GB.
>> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>> Now, say we're doing historical rebalance and reserve the WAL archive.
>> The WAL archive starts growing and soon it occupies 2 GB.
>> Now what?
>> We're supposed to give up WAL reservations and start agressively removing 
>> WAL archive.
>> But it is not clear when can we stop removing WAL archive - since last 2 
>> hours of WAL are larger than our maxWalArchiveSize
>> there is no meaningful point the system can use as a "minimum" WAL size.
>> 
>> I understand the description above is a bit messy but I believe that whoever 
>> is interested in this will understand it
>> after drawing this on paper.
>> 
>> 
>> I'm giving up on my latest suggestion about time-based minimum. Let's keep 
>> it simple.
>> 
>> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the 
>> solution,
>> with the behavior as initially described by Kirill.
>> 
>> Stan
>> 
>> 
>>> On 7 May 2021, at 15:09, ткаленко кирилл <tkalkir...@yandex.ru> wrote:
>>> 
>>> Stas hello!
>>> 
>>> I didn't quite get your last idea.
>>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the 
>>> segment until minWalArchiveTimespan?
>>> 
>>> 06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukya...@gmail.com>:
>>>> An interesting suggestion I heard today.
>>>> 
>>>> The minWalArchiveSize property might actually be minWalArchiveTimespan - 
>>>> i.e. be a number of seconds instead of a number of bytes!
>>>> 
>>>> I think this makes perfect sense from the user point of view.
>>>> "I want to have WAL archive for at least N hours but I have a limit of M 
>>>> gigabytes to store it".
>>>> 
>>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>> Perhaps we can actually implement this?
>>>> 
>>>> Thanks,
>>>> Stan
>>>> 
>>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukya...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>> +1 to add a public property to replace 
>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>> 
>>>>> I don't like the name getWalArchiveSize - I think it's a bit confusing 
>>>>> (is it the current size? the minimal size? the target size?)
>>>>> I suggest to name the property geMintWalArchiveSize. I think that this is 
>>>>> exactly what it is - the minimal size of the archive that we want to have.
>>>>> The archive size at all times should be between min and max.
>>>>> If archive size is less than min or more than max then the system 
>>>>> functionality can degrade (e.g. historical rebalance may not work as 
>>>>> expected).
>>>>> I think these rules are intuitively understood from the "min" and "max" 
>>>>> names.
>>>>> 
>>>>> Ilya's suggestion about throttling is great although I'd do this in a 
>>>>> different ticket.
>>>>> 
>>>>> Thanks,
>>>>> Stan
>>>>> 
>>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mmu...@apache.org> wrote:
>>>>>> 
>>>>>> Hello, Kirill
>>>>>> 
>>>>>> +1 for this change, however, there are too many configuration settings
>>>>>> that exist for the user to configure Ignite cluster. It is better to
>>>>>> keep the options that we already have and fix the behaviour of the
>>>>>> rebalance process as you suggested.
>>>>>> 
>>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> 
>>>>>> wrote:
>>>>>>> Hi Ilya!
>>>>>>> 
>>>>>>> Then we can greatly reduce the user load on the cluster until the 
>>>>>>> rebalance is over. Which can be critical for the user.
>>>>>>> 
>>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>:
>>>>>>>> Hello!
>>>>>>>> 
>>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint 
>>>>>>>> based
>>>>>>>> write throttling?
>>>>>>>> 
>>>>>>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>> 
>>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>:
>>>>>>>> 
>>>>>>>>> Hello everybody!
>>>>>>>>> 
>>>>>>>>> At the moment, if there are partitions for the rebalance for which the
>>>>>>>>> historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>> archive (we do not allow cleaning the WAL archive) until the 
>>>>>>>>> rebalance for
>>>>>>>>> all cache groups is over.
>>>>>>>>> 
>>>>>>>>> If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>> significantly exceed limits set in
>>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>> complete. This may lead to user issues and nodes may crash with the 
>>>>>>>>> "No
>>>>>>>>> space left on device" error.
>>>>>>>>> 
>>>>>>>>> We have a system property 
>>>>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>> default 0.5, which sets the threshold (multiplied by 
>>>>>>>>> getMaxWalArchiveSize)
>>>>>>>>> from which and up to which the WAL archive will be cleared, i.e. sets 
>>>>>>>>> the
>>>>>>>>> size of the WAL archive that will always be on the node. I propose to
>>>>>>>>> replace this system property with the
>>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>> 
>>>>>>>>> Main proposal:
>>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, 
>>>>>>>>> cancel
>>>>>>>>> and do not give the reservation of the WAL segments until we reach
>>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is 
>>>>>>>>> no
>>>>>>>>> segment for historical rebalance, we will automatically switch to full
>>>>>>>>> rebalance.
>>

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Reply via email to