>>it's one of the reasons why a fresh base backup is required when starting old master as new standby? >>If yes, I agree with you. I've often heard the complaints about a backup when restarting new standby. >>That's really big problem.
I think Fujii Masao is on the same page. >In case of syncrep the master just waits for confirmation from standby before returning to client on >commit. >Not just commit, you must stop any *writing* of the wal records effectively killing any parallelism. > Min issue is that it will make *all* backends dependant on each sync commit, essentially serialising all >backends commits, with the serialisation *including* the latency of roundtrip to client. With current >sync streaming the other backends can continue to write wal, with proposed approach you cannot >write any records after the one waiting an ACK from standby. Let me rephrase the proposal in a more accurate manner: Consider following scenario: (1) A client sends the "COMMIT" command to the master server. (2) The master writes WAL record to disk (3) The master writes the data page related to this transaction. i.e. via checkpoint or bgwriter. (4) The master sends WAL records continuously to the standby, up to the commit WAL record. (5) The standby receives WAL records, writes them to the disk, and then replies the ACK. (6) The master returns a success indication to a client after it receives ACK. If failover happens between (3) and (4), WAL and DB data in old master are ahead of them in new master. After failover, new master continues running new transactions independently from old master. Then WAL record and DB data would become inconsistent between those two servers. To resolve these inconsistencies, the backup of new master needs to be taken onto new standby. But taking backup is not feasible in case of larger database size with several TB over a slow WAN. So to avoid this type of inconsistency without taking fresh backup we are thinking to do following thing: >> I think that you can introduce GUC specifying whether this extra check is required to avoid a backup >>when failback. Approach: Introduce new GUC option specifying whether to prevent PostgreSQL from writing DB data before corresponding WAL records have been replicated to the standby. That is, if this GUC option is enabled, PostgreSQL waits for corresponding WAL records to be not only written to the disk but also replicated to the standby before writing DB data. So the process becomes as follows: (1) A client sends the "COMMIT" command to the master server. (2) The master writes the commit WAL record to the disk. (3) The master sends WAL records continuously to standby up to the commit WAL record. (4) The standby receives WAL records, writes them to disk, and then replies the ACK. (5) *The master then forces a write of the data page related to this transaction. * (6) The master returns a success indication to a client after it receives ACK. While master is waiting to force a write (point 5) for this data page, streaming replication continuous. Also other data page writes are not dependent on this particular page write. So the commit of data pages are not serialized. Regards, Samrat