On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)" > <kuroda.hay...@fujitsu.com> wrote in > > I have implemented and tested that workers wake up per > > wal_receiver_timeout/2 > > and send keepalive. Basically it works well, but I found two problems. > > Do you have any good suggestions about them? > > > > 1) > > > > With this PoC at present, workers calculate sending intervals based on its > > wal_receiver_timeout, and it is suppressed when the parameter is set to > > zero. > > > > This means that there is a possibility that walsender is timeout when > > wal_sender_timeout > > in publisher and wal_receiver_timeout in subscriber is different. > > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min, > > It seems to me wal_receiver_status_interval is better for this use. > It's enough for us to docuemnt that "wal_r_s_interval should be > shorter than wal_sener_timeout/2 especially when logical replication > connection is using min_apply_delay. Otherwise you will suffer > repeated termination of walsender". >
This sounds reasonable to me. > > and min_apply_delay is 10min. The worker on subscriber will wake up per > > 2.5min and > > send keepalives, but walsender exits before the message arrives to > > publisher. > > > > One idea to avoid that is to send the min_apply_delay subscriber option to > > publisher > > and compare them, but it may be not sufficient. Because XXX_timout GUC > > parameters > > could be modified later. > > # Anyway, I don't think such asymmetric setup is preferable. > > > 2) > > > > The issue reported by Vignesh-san[1] has still remained. I have already > > analyzed that [2], > > the root cause is that flushed WAL is not updated and sent to the > > publisher. Even > > if workers send keepalive messages to pub during the delay, the flushed > > position > > cannot be modified. > > I didn't look closer but the cause I guess is walsender doesn't die > until all WAL has been sent, while logical delay chokes replication > stream. > Right, I also think so. > Allowing walsender to finish ignoring replication status > wouldn't be great. > Yes, that would be ideal. But do you know why that is a must? > One idea is to let logical workers send delaying > status. > How can that help? -- With Regards, Amit Kapila.