I believe I've figured out why synchronous replication has such terrible performance with fsync=off: it has a nasty race condition. It may happen - if the standby responds very quickly - that the standby acks the commit record and awakens waiters before the committing backend actually begins to wait. There's no cross-check for this: the committing backend waits unconditionally, with no regard to whether the necessary ACK has already arrived. At this point we may be in for a very long wait: another ACK will be required to release waiters, and that may not be immediately forthcoming. I had thought that the next ACK (after at most wal_receiver_status_interval) would do the trick, but it appears to be even worse than that: by making the standby win the race, I was easily able to get the master to hang for over a minute, and it only got released when I committed another transaction. Had I been sufficiently patient, the next checkpoint probably would have done the trick.
Of course, with fsync=off on the standby, it's much easier for the standby to win the race. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers