Greg Stark wrote: > Keepalives introduce spurious disconnections in working TCP > connections that have transient outages It's been a while since I read up on this, so perhaps my memory has distorted the facts over time, but I thought that under TCP, if one side sends a packet which isn't ack'd after a (configurable) number of tries with certain (configurable) timings, the connection would be considered broken and an error returned regardless of keepalive settings. I thought keepalive only generated a trickle of small packets during idle time so that broken connections could be detected on the side of a connection which was waiting to receive data before doing something. That doesn't sound consistent with your characterization, though, since if my recollection is right, one could just as easily say that any write to a TCP socket by the application can also cause "spurious disconnections in working TCP connections that have transient outages." I know that with a two minute keepalive timeout, I can unplug a machine from one switch port and plug it in somewhere else and the networking hardware sorts things out fast enough that the transient network outage doesn't break the TCP connection, whether the application is sending data or it is quiescent and the OS is sending keepalive packets. >From what I've read about the present walreceiver retry logic, if the connection breaks, WR will use some intelligence to try the archive and retry connecting through TCP, in turn, until it finds data. If the connection goes silent without breaking, WR sits there forever without looking at the archive or trying to obtain a new TCP connection to the master. I know which behavior I'd prefer. Apparently the testers who encountered the behavior felt the same. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers