walsender timeout on logical replication set

Kyotaro Horiguchi Sun, 12 Sep 2021 18:31:22 -0700

Hello.

As reported in [1] it seems that walsender can suffer timeout in
certain cases.  It is not clearly confirmed, but I suspect that
there's the case where LogicalRepApplyLoop keeps running the innermost
loop without receiving keepalive packet for longer than
wal_sender_timeout (not wal_receiver_timeout).  Of course that can be
resolved by giving sufficient processing power to the subscriber if
not. But if that happens between the servers with the equal processing
power, it is reasonable to "fix" this.  Theoretically I think this can
happen with equally-powered servers if the connecting network is
sufficiently fast.  Because sending reordered changes is relatively
simple and fast than apllying the changes on subscriber.


I think we don't want to call GetCurrentTimestamp every iteration of
the innermost loop.  Even if we call it every N iterations, I don't
come up with a proper N that fits any workload. So one possible
solution would be using slgalrm.  Is it worth doing?  Or is there any
other way?

Even if we won't fix this, we might need to add a description about
this restriciton in the documentation?

Any thougths?

[1] 
https://www.postgresql.org/message-id/CAEDsCzhBtkNDLM46_fo_HirFYE2Mb3ucbZrYqG59ocWqWy7-xA%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

walsender timeout on logical replication set

Reply via email to