Hello. As reported in [1] it seems that walsender can suffer timeout in certain cases. It is not clearly confirmed, but I suspect that there's the case where LogicalRepApplyLoop keeps running the innermost loop without receiving keepalive packet for longer than wal_sender_timeout (not wal_receiver_timeout). Of course that can be resolved by giving sufficient processing power to the subscriber if not. But if that happens between the servers with the equal processing power, it is reasonable to "fix" this. Theoretically I think this can happen with equally-powered servers if the connecting network is sufficiently fast. Because sending reordered changes is relatively simple and fast than apllying the changes on subscriber.
I think we don't want to call GetCurrentTimestamp every iteration of the innermost loop. Even if we call it every N iterations, I don't come up with a proper N that fits any workload. So one possible solution would be using slgalrm. Is it worth doing? Or is there any other way? Even if we won't fix this, we might need to add a description about this restriciton in the documentation? Any thougths? [1] https://www.postgresql.org/message-id/CAEDsCzhBtkNDLM46_fo_HirFYE2Mb3ucbZrYqG59ocWqWy7-xA%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center