Re: [HACKERS] Skip checkpoint on promoting from streaming replication

Heikki Linnakangas Thu, 24 Jan 2013 08:52:25 -0800

On 24.01.2013 18:24, Simon Riggs wrote:

On 6 January 2013 21:58, Simon Riggs<[email protected]>  wrote:

I've been torn between the need to remove the checkpoint for speed and
being worried about the implications of doing so.


We promote in multiple use cases. When we end a PITR, or are
performing a switchover, it doesn't really matter how long the
shutdown checkpoint takes, so I'm inclined to leave it there in those
cases. For failover, we need fast promotion.

So my thinking is to make   pg_ctl promote -m fast
be the way to initiate a fast failover that skips the shutdown checkpoint.

That way all existing applications work the same as before, while new
users that explicitly choose to do so will gain from the new option.


Here's a patch to skip checkpoint when we do

   pg_ctl promote -m fast

We keep the end of recovery checkpoint in all other cases.


Hmm, there seems to be no way to do a "fast" promotion with a trigger file.

I'm a bit confused why there needs to be special mode for this. Can't wejust always do the "fast" promotion? I agree that there's no urgencywhen you're doing PITR, but shouldn't do any harm either. Or perhapsalways do "fast" promotion when starting up from standby mode, and"slow" otherwise.

Are we comfortable enough with this to skip the checkpoint after crashrecovery?

I may be missing something, but it looks like after a "fast" promotion,you don't request a new checkpoint. So it can take quite a while for thenext checkpoint to be triggered by checkpoint_timeout/segments. Thatshouldn't be a problem, but I feel that it'd be prudent to request a newcheckpoint immediately (not necessarily an "immediate" checkpoint, though).

The only thing left from Kyotaro's patch is a single line of code -
the call to ReadCheckpointRecord() that checks to see if the WAL
records for the last two restartpoints is on disk, which was an
important line of code.

Why's that important, just for paranoia? If the last two restartpointshave disappeared, something's seriously wrong, and you will be introuble e.g if you crash at that point. Do we need to be extra paranoidwhen doing a "fast" promotion?

Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
on replay like a shutdown checkpoint record. I put this back in from
my patch because I believe its important that we have a clear place
where the WAL history changes timelineId. WAL format change bump
required.


Agreed, such a WAL record is essential.

At replay, an end-of-recovery record should be a signal to the hotstandby mechanism that there are no transactions running in the masterat that point, same as a shutdown checkpoint.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

Reply via email to