Re: Time delayed LR (WAS Re: logical replication restrictions)

Amit Kapila Wed, 14 Dec 2022 19:53:40 -0800

On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi
<horikyota....@gmail.com> wrote:
>
> At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)" 
> <kuroda.hay...@fujitsu.com> wrote in
> > I have implemented and tested that workers wake up per 
> > wal_receiver_timeout/2
> > and send keepalive. Basically it works well, but I found two problems.
> > Do you have any good suggestions about them?
> >
> > 1)
> >
> > With this PoC at present, workers calculate sending intervals based on its
> > wal_receiver_timeout, and it is suppressed when the parameter is set to 
> > zero.
> >
> > This means that there is a possibility that walsender is timeout when 
> > wal_sender_timeout
> > in publisher and wal_receiver_timeout in subscriber is different.
> > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min,
>
> It seems to me wal_receiver_status_interval is better for this use.
> It's enough for us to docuemnt that "wal_r_s_interval should be
> shorter than wal_sener_timeout/2 especially when logical replication
> connection is using min_apply_delay. Otherwise you will suffer
> repeated termination of walsender".
>


This sounds reasonable to me.

> > and min_apply_delay is 10min. The worker on subscriber will wake up per 
> > 2.5min and
> > send keepalives, but walsender exits before the message arrives to 
> > publisher.
> >
> > One idea to avoid that is to send the min_apply_delay subscriber option to 
> > publisher
> > and compare them, but it may be not sufficient. Because XXX_timout GUC 
> > parameters
> > could be modified later.
>
> # Anyway, I don't think such asymmetric setup is preferable.
>
> > 2)
> >
> > The issue reported by Vignesh-san[1] has still remained. I have already 
> > analyzed that [2],
> > the root cause is that flushed WAL is not updated and sent to the 
> > publisher. Even
> > if workers send keepalive messages to pub during the delay, the flushed 
> > position
> > cannot be modified.
>
> I didn't look closer but the cause I guess is walsender doesn't die
> until all WAL has been sent, while logical delay chokes replication
> stream.
>

Right, I also think so.

> Allowing walsender to finish ignoring replication status
> wouldn't be great.
>

Yes, that would be ideal. But do you know why that is a must?

>  One idea is to let logical workers send delaying
> status.
>

How can that help?

-- 
With Regards,
Amit Kapila.

Re: Time delayed LR (WAS Re: logical replication restrictions)

Reply via email to