Hi,

We have had issue with walsender timeout when used with logical decoding
and the transaction is taking long time to be decoded (because it
contains many changes)

I was looking today at the walsender code and realized that it's because
if the network and downstream are fast enough, we'll always take fast
path in WalSndWriteData which does not do reply or keepalive processing
and is only reached once the transaction has finished by other code. So
paradoxically we die of timeout because everything was fast enough to
never fall back to slow code path.

I propose we only use fast path if the last processed reply is not older
than half of walsender timeout, if it is then we'll force the slow code
path to process the replies again. This is similar logic that we use to
determine if to send keepalive message. I also added CHECK_INTERRUPRS
call to fast code path because otherwise walsender might ignore them for
too long on large transactions.

Thoughts?

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment: 0001-Fix-walsender-timeouts-when-decoding-large-transacti.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to