On 17.01.2013 21:57, Heikki Linnakangas wrote:
On 17.01.2013 20:08, Andres Freund wrote:
On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:
I encountered the problem that the timeline switch is not performed
expectedly.
I set up one master, one standby and one cascade standby. All the
servers
share the archive directory. restore_command is specified in the
recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the
following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
....

That's after the commit or before? Because in passing I think I
noticed/fixed a bug that could cause exactly that problem...

I think I broke that with the "teach pg_receivexlog to cross timelines"
patch. Will take a look...

Ok, there was a couple of issues. First, as I suspected above, I added a new result set after copy has ended in START_STREAMING command, but forgot to teach walreceiver about it. Fixed that.

Secondly, there's an interesting interaction between the new xlogreader code and streaming replication and timeline switches:

The first call to the page_read callback in xlogreader is always made with size SizeOfXLogRecord, counting from the beginning of the page. That's bogus; there can be no WAL record in the very beginning of page, because of the page header, so I think that was supposed to be SizeXLogShortPHD. But that won't fix the issue.

The problem is that XLogPageRead callback uses the page address and requested length to decide what timeline to stream from. For example, imagine that there's a timeline switch at offset 1000 in a page, and we have already read all the WAL up to that point. When xlogreader first asks to fetch the first 32 bytes from the page (the bogus SizeOfXLogRecord), XLogPageRead deduces that that byte range is still on the old timeline, and starts streaming from the old timeline. Next, xlogreader needs the rest of the page, up to 1000 + SizeOfPageHeader, to read the first record it's actually interested in, XLogPageRead will return an error because that range is not on the timeline that's currently streamed. And we loop back to retry, and run into the same problem again.

This interaction is a bit too subtle for my taste, but the straightforward fix is to just modify xlogreader so that the first read_page call requests all the bytes up to record we're actually interested in. That seems like a smart thing to do anyway.

I committed a patch for that second issue too, but please take a look in case you come up with a better idea.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to