Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

ткаленко кирилл Thu, 17 Jun 2021 04:26:22 -0700

Created the first task by this discussion IGNITE-14923.


13.05.2021, 18:37, "Stanislav Lukyanov" <stanlukya...@gmail.com>:
> What I mean by degradation when archive size < min is that, for example, 
> historical rebalance is available for a smaller timespan than expected by the 
> system design.
> It may not be an issue of course, especially for a new cluster. If 
> "degradation" is the wrong word we can call it "non-steady state" :)
> In any case, I think we're on the same page.
>
>>  On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
>>
>>  Stan
>>
>>>  If archive size is less than min or more than max then the system 
>>> functionality can degrade (e.g. historical rebalance may not work as 
>>> expected).
>>
>>  Why does the condition "archive size is less than min" lead to system
>>  degradation? Actually, the described case is a normal situation for
>>  brand new clusters.
>>
>>  I'm okay with the proposed minWalArchiveSize property. Looks like
>>  relatively understandable property.
>>
>>  On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>>  <stanlukya...@gmail.com> wrote:
>>>  Discuss this with Kirill verbally.
>>>
>>>  Kirill showed me that having the min threshold doesn't quite work.
>>>  It doesn't work because we no longer know how much WAL we should remove if 
>>> we reach getMaxWalArchiveSize.
>>>
>>>  For example, say we have minWalArchiveTimespan=2 hours and 
>>> maxWalArchiveSize=2GB.
>>>  Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>>>  Now, say we're doing historical rebalance and reserve the WAL archive.
>>>  The WAL archive starts growing and soon it occupies 2 GB.
>>>  Now what?
>>>  We're supposed to give up WAL reservations and start agressively removing 
>>> WAL archive.
>>>  But it is not clear when can we stop removing WAL archive - since last 2 
>>> hours of WAL are larger than our maxWalArchiveSize
>>>  there is no meaningful point the system can use as a "minimum" WAL size.
>>>
>>>  I understand the description above is a bit messy but I believe that 
>>> whoever is interested in this will understand it
>>>  after drawing this on paper.
>>>
>>>  I'm giving up on my latest suggestion about time-based minimum. Let's keep 
>>> it simple.
>>>
>>>  I suggest the minWalArchiveSize and maxWalArchvieSize properties as the 
>>> solution,
>>>  with the behavior as initially described by Kirill.
>>>
>>>  Stan
>>>
>>>>  On 7 May 2021, at 15:09, ткаленко кирилл <tkalkir...@yandex.ru> wrote:
>>>>
>>>>  Stas hello!
>>>>
>>>>  I didn't quite get your last idea.
>>>>  What will we do if we reach getMaxWalArchiveSize? Shall we not delete the 
>>>> segment until minWalArchiveTimespan?
>>>>
>>>>  06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukya...@gmail.com>:
>>>>>  An interesting suggestion I heard today.
>>>>>
>>>>>  The minWalArchiveSize property might actually be minWalArchiveTimespan - 
>>>>> i.e. be a number of seconds instead of a number of bytes!
>>>>>
>>>>>  I think this makes perfect sense from the user point of view.
>>>>>  "I want to have WAL archive for at least N hours but I have a limit of M 
>>>>> gigabytes to store it".
>>>>>
>>>>>  Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>>  Perhaps we can actually implement this?
>>>>>
>>>>>  Thanks,
>>>>>  Stan
>>>>>
>>>>>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukya...@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>>  +1 to add a public property to replace 
>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>
>>>>>>  I don't like the name getWalArchiveSize - I think it's a bit confusing 
>>>>>> (is it the current size? the minimal size? the target size?)
>>>>>>  I suggest to name the property geMintWalArchiveSize. I think that this 
>>>>>> is exactly what it is - the minimal size of the archive that we want to 
>>>>>> have.
>>>>>>  The archive size at all times should be between min and max.
>>>>>>  If archive size is less than min or more than max then the system 
>>>>>> functionality can degrade (e.g. historical rebalance may not work as 
>>>>>> expected).
>>>>>>  I think these rules are intuitively understood from the "min" and "max" 
>>>>>> names.
>>>>>>
>>>>>>  Ilya's suggestion about throttling is great although I'd do this in a 
>>>>>> different ticket.
>>>>>>
>>>>>>  Thanks,
>>>>>>  Stan
>>>>>>
>>>>>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mmu...@apache.org> wrote:
>>>>>>>
>>>>>>>  Hello, Kirill
>>>>>>>
>>>>>>>  +1 for this change, however, there are too many configuration settings
>>>>>>>  that exist for the user to configure Ignite cluster. It is better to
>>>>>>>  keep the options that we already have and fix the behaviour of the
>>>>>>>  rebalance process as you suggested.
>>>>>>>
>>>>>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> 
>>>>>>> wrote:
>>>>>>>>  Hi Ilya!
>>>>>>>>
>>>>>>>>  Then we can greatly reduce the user load on the cluster until the 
>>>>>>>> rebalance is over. Which can be critical for the user.
>>>>>>>>
>>>>>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>:
>>>>>>>>>  Hello!
>>>>>>>>>
>>>>>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint 
>>>>>>>>> based
>>>>>>>>>  write throttling?
>>>>>>>>>
>>>>>>>>>  So we will be throttling for both checkpoint page buffer and WAL 
>>>>>>>>> limit.
>>>>>>>>>
>>>>>>>>>  Regards,
>>>>>>>>>  --
>>>>>>>>>  Ilya Kasnacheev
>>>>>>>>>
>>>>>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>:
>>>>>>>>>
>>>>>>>>>>  Hello everybody!
>>>>>>>>>>
>>>>>>>>>>  At the moment, if there are partitions for the rebalance for which 
>>>>>>>>>> the
>>>>>>>>>>  historical rebalance will be used, then we reserve segments in the 
>>>>>>>>>> WAL
>>>>>>>>>>  archive (we do not allow cleaning the WAL archive) until the 
>>>>>>>>>> rebalance for
>>>>>>>>>>  all cache groups is over.
>>>>>>>>>>
>>>>>>>>>>  If a cluster is under load during the rebalance, WAL archive size 
>>>>>>>>>> may
>>>>>>>>>>  significantly exceed limits set in
>>>>>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>>  complete. This may lead to user issues and nodes may crash with the 
>>>>>>>>>> "No
>>>>>>>>>>  space left on device" error.
>>>>>>>>>>
>>>>>>>>>>  We have a system property 
>>>>>>>>>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>>  default 0.5, which sets the threshold (multiplied by 
>>>>>>>>>> getMaxWalArchiveSize)
>>>>>>>>>>  from which and up to which the WAL archive will be cleared, i.e. 
>>>>>>>>>> sets the
>>>>>>>>>>  size of the WAL archive that will always be on the node. I propose 
>>>>>>>>>> to
>>>>>>>>>>  replace this system property with the
>>>>>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>
>>>>>>>>>>  Main proposal:
>>>>>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, 
>>>>>>>>>> cancel
>>>>>>>>>>  and do not give the reservation of the WAL segments until we reach
>>>>>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there 
>>>>>>>>>> is no
>>>>>>>>>>  segment for historical rebalance, we will automatically switch to 
>>>>>>>>>> full
>>>>>>>>>>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Reply via email to