On 12/16/2014 10:24 AM, Borodin Vladimir wrote:
12 дек. 2014 г., в 16:46, Heikki Linnakangas
<hlinnakan...@vmware.com> написал(а):

There have been a few threads on the behavior of WAL archiving,
after a standby server is promoted [1] [2]. In short, it doesn't
work as you might expect. The standby will start archiving after
it's promoted, but it will not archive files that were replicated
from the old master via streaming replication. If those files were
not already archived in the master before the promotion, they are
not archived at all. That's not good if you wanted to restore from
a base backup + the WAL archive later.

The basic setup is a master server, a standby, a WAL archive that's
shared by both, and streaming replication between the master and
standby. This should be a very common setup in the field, so how
are people doing it in practice? Just live with the wisk that you
might miss some files in the archive if you promote? Don't even
realize there's a problem? Something else?

Yes, I do live like that (with streaming replication and shared
archive between master and replicas) and don’t even realize there’s a
problem :( And I think I’m not the only one. Maybe at least a note
should be added to the documentation?

Let's try to figure out a way to fix this in master, but yeah, a note in the documentation is in order.

And how would we like it to work?

Here's a plan:

Have a mechanism in the standby, to track how far the master has archived its WAL, and don't throw away WAL in the standby that hasn't been archived in the master yet. This is similar to the physical replication slots, which prevent the master from recycling WAL that a standby hasn't received yet, but in reverse. I think we can use the .done and .ready files for this. Whenever a file is streamed (completely) from the master, create a .ready file for it. When we get an acknowledgement from the master that it has archived it, create a .done file for it. To get the information from the master, add the "last archived WAL segment" e.g. in the streaming replication keep-alive message, or invent a new message type for it.

At promotion, archive all the WAL from the old timeline that the master hadn't already archived. While doing this, the archive_command can be called for files that have in fact already been archived in the master, so the command needs to return success if it's asked to archive a file and an identical file already exists in the archive. That's a bit difficult to write into a one-liner, but hopefully we can still provide an example of this. Or have another command, e.g. "promotion_archive_command", which can just assume that everything is OK if the file already exists.

To enable this new mode, let's add a third option to archive_mode, besides on/off. Or just make this the default; I'm not sure if anyone would want the old behavior.

There was some discussion in August on enabling WAL archiving in
the standby, always [3]. That's a related idea, but it assumes that
you have a separate archive in the master and the standby. The
problem at promotion happens when you have a shared archive between
the master and standby.

AFAIK most people use the scheme with shared archive.

Yeah. Anyway, we can support both scenarios.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to