Re: [HACKERS] Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Heikki Linnakangas Thu, 17 Jan 2013 13:49:49 -0800

On 17.01.2013 21:57, Heikki Linnakangas wrote:

On 17.01.2013 20:08, Andres Freund wrote:

On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:

I encountered the problem that the timeline switch is not performed
expectedly.
I set up one master, one standby and one cascade standby. All the
servers
share the archive directory. restore_command is specified in the
recovery.conf
in those two standbys.


I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the
following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
....


That's after the commit or before? Because in passing I think I
noticed/fixed a bug that could cause exactly that problem...


I think I broke that with the "teach pg_receivexlog to cross timelines"
patch. Will take a look...

Ok, there was a couple of issues. First, as I suspected above, I added anew result set after copy has ended in START_STREAMING command, butforgot to teach walreceiver about it. Fixed that.

Secondly, there's an interesting interaction between the new xlogreadercode and streaming replication and timeline switches:

The first call to the page_read callback in xlogreader is always madewith size SizeOfXLogRecord, counting from the beginning of the page.That's bogus; there can be no WAL record in the very beginning of page,because of the page header, so I think that was supposed to beSizeXLogShortPHD. But that won't fix the issue.

The problem is that XLogPageRead callback uses the page address andrequested length to decide what timeline to stream from. For example,imagine that there's a timeline switch at offset 1000 in a page, and wehave already read all the WAL up to that point. When xlogreader firstasks to fetch the first 32 bytes from the page (the bogusSizeOfXLogRecord), XLogPageRead deduces that that byte range is still onthe old timeline, and starts streaming from the old timeline. Next,xlogreader needs the rest of the page, up to 1000 + SizeOfPageHeader, toread the first record it's actually interested in, XLogPageRead willreturn an error because that range is not on the timeline that'scurrently streamed. And we loop back to retry, and run into the sameproblem again.

This interaction is a bit too subtle for my taste, but thestraightforward fix is to just modify xlogreader so that the firstread_page call requests all the bytes up to record we're actuallyinterested in. That seems like a smart thing to do anyway.

I committed a patch for that second issue too, but please take a look incase you come up with a better idea.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Reply via email to