When a server fails, we need to promote a standby as quickly as possible. Currently when we promote a standby to a primary we need to run a shutdown checkpoint before users can begin write transactions, which in many cases can take minutes.
The reason we run a shutdown checkpoint is to prevent needing to re-enter recovery if we crash after promotion. When we only had file based replication, all WAL files were reloaded from archive each time, so the restartpoint prior to the end of recovery was not guaranteed to be available in pg_xlog. Once we had exited archive recovery it would be difficult to re-access the archive. Now with streaming replication, we keep the WAL files in pg_xlog directly, so the last restartpoint is always available if we should crash. So if streaming replication is active at the point we promote, then we can skip the shutdown checkpoint. It's that simple. To make it even simpler, I suggest we also change file de-archiving so that it writes normal WAL files, not RECOVERYXLOG, so that way we can avoid the checkpoint in all cases. There are comments saying we can only increment a timeline via a shutdown checkpoint, but if we were smart we'd have noticed the timeline change via the WAL file numbering anyway. Best way seems to be to have a XLOG_TIMELINE_CHANGE record written instead of the shutdown checkpoint. When I say skip the shutdown checkpoint, I mean remove it from the critical path of required actions at the end of recovery. We can still have a normal checkpoint kicked off at that time, but that no longer needs to be on the critical path. Any problems foreseen? If not, looks like a quick patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers