Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

SATYANARAYANA NARLAPURAM Wed, 29 Dec 2021 11:22:50 -0800

On Wed, Dec 29, 2021 at 11:16 AM Stephen Frost <sfr...@snowman.net> wrote:


> Greetings,
>
> On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM <
> satyanarlapu...@gmail.com> wrote:
>
>> Stephen, thank you!
>>
>> On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfr...@snowman.net> wrote:
>>
>>> Greetings,
>>>
>>> * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote:
>>> > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbal...@gmail.com>
>>> wrote:
>>> > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
>>> > > satyanarlapu...@gmail.com> wrote:
>>> > >>> Actually all the WAL insertions are done under a critical section
>>> > >>> (except few exceptions), that means if you see all the references
>>> of
>>> > >>> XLogInsert(), it is always called under the critical section and
>>> that is my
>>> > >>> main worry about hooking at XLogInsert level.
>>> > >>>
>>> > >>
>>> > >> Got it, understood the concern. But can we document the limitations
>>> of
>>> > >> the hook and let the hook take care of it? I don't expect an error
>>> to be
>>> > >> thrown here since we are not planning to allocate memory or make
>>> file
>>> > >> system calls but instead look at the shared memory state and add
>>> delays
>>> > >> when required.
>>> > >>
>>> > >>
>>> > > Yet another problem is that if we are in XlogInsert() that means we
>>> are
>>> > > holding the buffer locks on all the pages we have modified, so if we
>>> add a
>>> > > hook at that level which can make it wait then we would also block
>>> any of
>>> > > the read operations needed to read from those buffers.  I haven't
>>> thought
>>> > > what could be better way to do this but this is certainly not good.
>>> > >
>>> >
>>> > Yes, this is a problem. The other approach is adding a hook at
>>> > XLogWrite/XLogFlush? All the other backends will be waiting behind the
>>> > WALWriteLock. The process that is performing the write enters into a
>>> busy
>>> > loop with small delays until the criteria are met. Inability to
>>> process the
>>> > interrupts inside the critical section is a challenge in both
>>> approaches.
>>> > Any other thoughts?
>>>
>>> Why not have this work the exact same way sync replicas do, except that
>>> it's based off of some byte/time lag for some set of async replicas?
>>> That is, in RecordTransactionCommit(), perhaps right after the
>>> SyncRepWaitForLSN() call, or maybe even add this to that function?  Sure
>>> seems like there's a lot of similarity.
>>>
>>
>> I was thinking of achieving log governance (throttling WAL MB/sec) and
>> also providing RPO guarantees. In this model, it is hard to throttle WAL
>> generation of a long running transaction (for example copy/select into).
>>
>
> Long running transactions have a lot of downsides and are best
> discouraged. I don’t know that we should be designing this for that case
> specifically, particularly given the complications it would introduce as
> discussed on this thread already.
>
> However, this meets my RPO needs. Are you in support of adding a hook or
>> the actual change? IMHO, the hook allows more creative options. I can go
>> ahead and make a patch accordingly.
>>
>
> I would think this would make more sense as part of core rather than a
> hook, as that then requires an extension and additional setup to get going,
> which raises the bar quite a bit when it comes to actually being used.
>

Sounds good, I will work on making the changes accordingly.

>
> Thanks,
>
> Stephen
>
>>

Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Reply via email to