Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

ткаленко кирилл Mon, 21 Jun 2021 02:14:05 -0700

Created the second task by this discussion, IGNITE-14952.


17.06.2021, 14:26, "ткаленко кирилл" <[email protected]>:
> Created the first task by this discussion IGNITE-14923.
>
> 13.05.2021, 18:37, "Stanislav Lukyanov" <[email protected]>:
>>  What I mean by degradation when archive size < min is that, for example, 
>> historical rebalance is available for a smaller timespan than expected by 
>> the system design.
>>  It may not be an issue of course, especially for a new cluster. If 
>> "degradation" is the wrong word we can call it "non-steady state" :)
>>  In any case, I think we're on the same page.
>>
>>>   On 11 May 2021, at 13:18, Andrey Gura <[email protected]> wrote:
>>>
>>>   Stan
>>>
>>>>   If archive size is less than min or more than max then the system 
>>>> functionality can degrade (e.g. historical rebalance may not work as 
>>>> expected).
>>>
>>>   Why does the condition "archive size is less than min" lead to system
>>>   degradation? Actually, the described case is a normal situation for
>>>   brand new clusters.
>>>
>>>   I'm okay with the proposed minWalArchiveSize property. Looks like
>>>   relatively understandable property.
>>>
>>>   On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>>>   <[email protected]> wrote:
>>>>   Discuss this with Kirill verbally.
>>>>
>>>>   Kirill showed me that having the min threshold doesn't quite work.
>>>>   It doesn't work because we no longer know how much WAL we should remove 
>>>> if we reach getMaxWalArchiveSize.
>>>>
>>>>   For example, say we have minWalArchiveTimespan=2 hours and 
>>>> maxWalArchiveSize=2GB.
>>>>   Say, under normal load on stable topology 2 hours of WAL use 1 GB of 
>>>> space.
>>>>   Now, say we're doing historical rebalance and reserve the WAL archive.
>>>>   The WAL archive starts growing and soon it occupies 2 GB.
>>>>   Now what?
>>>>   We're supposed to give up WAL reservations and start agressively 
>>>> removing WAL archive.
>>>>   But it is not clear when can we stop removing WAL archive - since last 2 
>>>> hours of WAL are larger than our maxWalArchiveSize
>>>>   there is no meaningful point the system can use as a "minimum" WAL size.
>>>>
>>>>   I understand the description above is a bit messy but I believe that 
>>>> whoever is interested in this will understand it
>>>>   after drawing this on paper.
>>>>
>>>>   I'm giving up on my latest suggestion about time-based minimum. Let's 
>>>> keep it simple.
>>>>
>>>>   I suggest the minWalArchiveSize and maxWalArchvieSize properties as the 
>>>> solution,
>>>>   with the behavior as initially described by Kirill.
>>>>
>>>>   Stan
>>>>
>>>>>   On 7 May 2021, at 15:09, ткаленко кирилл <[email protected]> wrote:
>>>>>
>>>>>   Stas hello!
>>>>>
>>>>>   I didn't quite get your last idea.
>>>>>   What will we do if we reach getMaxWalArchiveSize? Shall we not delete 
>>>>> the segment until minWalArchiveTimespan?
>>>>>
>>>>>   06.05.2021, 20:00, "Stanislav Lukyanov" <[email protected]>:
>>>>>>   An interesting suggestion I heard today.
>>>>>>
>>>>>>   The minWalArchiveSize property might actually be minWalArchiveTimespan 
>>>>>> - i.e. be a number of seconds instead of a number of bytes!
>>>>>>
>>>>>>   I think this makes perfect sense from the user point of view.
>>>>>>   "I want to have WAL archive for at least N hours but I have a limit of 
>>>>>> M gigabytes to store it".
>>>>>>
>>>>>>   Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>>>   Perhaps we can actually implement this?
>>>>>>
>>>>>>   Thanks,
>>>>>>   Stan
>>>>>>
>>>>>>>   On 6 May 2021, at 14:13, Stanislav Lukyanov <[email protected]> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>   +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>>>   +1 to add a public property to replace 
>>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>>
>>>>>>>   I don't like the name getWalArchiveSize - I think it's a bit 
>>>>>>> confusing (is it the current size? the minimal size? the target size?)
>>>>>>>   I suggest to name the property geMintWalArchiveSize. I think that 
>>>>>>> this is exactly what it is - the minimal size of the archive that we 
>>>>>>> want to have.
>>>>>>>   The archive size at all times should be between min and max.
>>>>>>>   If archive size is less than min or more than max then the system 
>>>>>>> functionality can degrade (e.g. historical rebalance may not work as 
>>>>>>> expected).
>>>>>>>   I think these rules are intuitively understood from the "min" and 
>>>>>>> "max" names.
>>>>>>>
>>>>>>>   Ilya's suggestion about throttling is great although I'd do this in a 
>>>>>>> different ticket.
>>>>>>>
>>>>>>>   Thanks,
>>>>>>>   Stan
>>>>>>>
>>>>>>>>   On 5 May 2021, at 19:25, Maxim Muzafarov <[email protected]> wrote:
>>>>>>>>
>>>>>>>>   Hello, Kirill
>>>>>>>>
>>>>>>>>   +1 for this change, however, there are too many configuration 
>>>>>>>> settings
>>>>>>>>   that exist for the user to configure Ignite cluster. It is better to
>>>>>>>>   keep the options that we already have and fix the behaviour of the
>>>>>>>>   rebalance process as you suggested.
>>>>>>>>
>>>>>>>>   On Tue, 4 May 2021 at 19:01, ткаленко кирилл <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>>   Hi Ilya!
>>>>>>>>>
>>>>>>>>>   Then we can greatly reduce the user load on the cluster until the 
>>>>>>>>> rebalance is over. Which can be critical for the user.
>>>>>>>>>
>>>>>>>>>   04.05.2021, 18:43, "Ilya Kasnacheev" <[email protected]>:
>>>>>>>>>>   Hello!
>>>>>>>>>>
>>>>>>>>>>   Maybe we can have a mechanic here similar (or equal) to checkpoint 
>>>>>>>>>> based
>>>>>>>>>>   write throttling?
>>>>>>>>>>
>>>>>>>>>>   So we will be throttling for both checkpoint page buffer and WAL 
>>>>>>>>>> limit.
>>>>>>>>>>
>>>>>>>>>>   Regards,
>>>>>>>>>>   --
>>>>>>>>>>   Ilya Kasnacheev
>>>>>>>>>>
>>>>>>>>>>   вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>>   Hello everybody!
>>>>>>>>>>>
>>>>>>>>>>>   At the moment, if there are partitions for the rebalance for 
>>>>>>>>>>> which the
>>>>>>>>>>>   historical rebalance will be used, then we reserve segments in 
>>>>>>>>>>> the WAL
>>>>>>>>>>>   archive (we do not allow cleaning the WAL archive) until the 
>>>>>>>>>>> rebalance for
>>>>>>>>>>>   all cache groups is over.
>>>>>>>>>>>
>>>>>>>>>>>   If a cluster is under load during the rebalance, WAL archive size 
>>>>>>>>>>> may
>>>>>>>>>>>   significantly exceed limits set in
>>>>>>>>>>>   DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>>>   complete. This may lead to user issues and nodes may crash with 
>>>>>>>>>>> the "No
>>>>>>>>>>>   space left on device" error.
>>>>>>>>>>>
>>>>>>>>>>>   We have a system property 
>>>>>>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>>>   default 0.5, which sets the threshold (multiplied by 
>>>>>>>>>>> getMaxWalArchiveSize)
>>>>>>>>>>>   from which and up to which the WAL archive will be cleared, i.e. 
>>>>>>>>>>> sets the
>>>>>>>>>>>   size of the WAL archive that will always be on the node. I 
>>>>>>>>>>> propose to
>>>>>>>>>>>   replace this system property with the
>>>>>>>>>>>   DataStorageConfiguration#getWalArchiveSize in bytes, the default 
>>>>>>>>>>> is
>>>>>>>>>>>   (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>>
>>>>>>>>>>>   Main proposal:
>>>>>>>>>>>   When theDataStorageConfiguration#getMaxWalArchiveSize is reached, 
>>>>>>>>>>> cancel
>>>>>>>>>>>   and do not give the reservation of the WAL segments until we reach
>>>>>>>>>>>   DataStorageConfiguration#getWalArchiveSize. In this case, if 
>>>>>>>>>>> there is no
>>>>>>>>>>>   segment for historical rebalance, we will automatically switch to 
>>>>>>>>>>> full
>>>>>>>>>>>   rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Reply via email to