Re: Timeout when changes are filtered out by the core during logical replication

Ashutosh Bapat Mon, 09 Jan 2023 06:51:11 -0800

On Fri, Dec 23, 2022 at 2:45 PM Amit Kapila <[email protected]> wrote:


> On Thu, Dec 22, 2022 at 6:58 PM Ashutosh Bapat
> <[email protected]> wrote:
> >
> > Hi All,
> > A customer ran a script dropping a few dozens of users in a transaction.
> Before dropping a user they change the ownership of the tables owned by
> that user to another user and revoking all the accesses from that user in
> the same transaction. There were a few thousand tables whose privileges and
> ownership was changed by this transaction. Since all of these changes were
> in catalog table, those changes were filtered out in
> ReorderBufferProcessTXN()
> > by the following code
> >                    if (!RelationIsLogicallyLogged(relation))
> >                         goto change_done;
> >
> > I tried to reproduce a similar situation through the attached TAP test.
> For 500 users and 1000 tables, we see that the transaction takes
> significant time but logical decoding does not take much time. So with the
> default 1 min WAL sender and receiver timeout I could not reproduce the
> timeout. Beyond that our TAp test itself times out.
> >
> > But I think there's a possibility that the logical receiver will time
> out this way when decoding a sufficiently large transaction which takes
> more than the timeout amount of time to decode. So I think we need to call
> OutputPluginUpdateProgress() after a regular interval (in terms of time or
> number of changes) to consume any feedback from the subscriber or send a
> keep-alive message.
> >
>
> I don't think it will be a good idea to directly call
> OutputPluginUpdateProgress() from reorderbuffer.c. There is already a
> patch to discuss this problem [1].
>

Yeah. I don't mean to use OutputPluginUpdateProgress() directly. The patch
just showed that it helps calling it there in some way. Thanks for pointing
the other thread. I have reviewed the patch on that thread and continue the
discussion there.


>
> > Following commit
> > ```
> > commit 87c1dd246af8ace926645900f02886905b889718
> > Author: Amit Kapila <[email protected]>
> > Date:   Wed May 11 10:12:23 2022 +0530
> >
> >     Fix the logical replication timeout during large transactions.
> >
> >  ```
> > fixed a similar problem when the changes were filtered by an output
> plugin, but in this case the changes are not being handed over to the
> output plugin as well. If we fix it in the core we may not need to handle
> it in the output plugin as that commit does. The commit does not have a
> test case which I could run to reproduce the timeout.
> >
>
> It is not evident how to write a stable test for this because
> estimating how many changes are enough for the configured
> wal_receiver_timeout to
> pass on all the buildfarm machines is tricky. If you have good ideas
> then feel free to propose a test patch.
>

Will continue this on the other thread.

-- 
Best Wishes,
Ashutosh

Re: Timeout when changes are filtered out by the core during logical replication

Reply via email to