On 17.01.2013 16:56, Robert Haas wrote:
On Wed, Jan 16, 2013 at 11:08 AM, Heikki Linnakangas
<hlinnakan...@vmware.com>  wrote:
I'd prefer to leave the .partial suffix in place, as the segment really
isn't complete. It doesn't make a difference when you recover to the latest
timeline, but if you have a more complicated scenario with multiple
timelines that are still "alive", ie. there's a server still actively
generating WAL on that timeline, you'll easily get confused.

As an example, imagine that you have a master server, and one standby. You
maintain a WAL archive for backup purposes with pg_receivexlog, connected to
the standby. Now, for some reason, you get a split-brain situation and the
standby server is promoted with new timeline 2, while the real master is
still running. The DBA notices the problem, and kills the standby and
pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in
pg_receivexlog's target directory, and re-points pg_recevexlog to the master
while he re-builds the standby server from backup. At that point,
pg_receivexlog will start streaming from the end of the zero-padded segment,
not knowing that it was partial, and you have a hole in the archived WAL
stream. Oops.

The DBA could avoid that by also removing the last WAL segment on timeline
1, the one that was partial. But it's really not obvious that there's
anything wrong with that segment. Keeping the .partial suffix makes it
clear.

I shudder at the idea that the DBA is manually involved in any of this.

The scenario I described is that you screwed up your failover environment, and end up with a split-brain situation by accident. The DBA certainly needs to be involved to recover from that.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to