On Tue, Jan 22, 2013 at 9:06 AM, Michael Paquier <michael.paqu...@gmail.com>wrote:
> > > On Fri, Jan 18, 2013 at 6:20 PM, Heikki Linnakangas < > hlinnakan...@vmware.com> wrote: > >> Hmm, so it's the same issue I thought I fixed yesterday. My patch only >> fixed it for the case that the timeline switch is in the first page of the >> segment. When it's not, you still get two calls for a WAL record, first one >> for the first page in the segment, to verify that, and then the page that >> actually contains the record. The first call leads XLogPageRead to think it >> needs to read from the old timeline. >> >> We didn't have this problem before the xlogreader refactoring because >> XLogPageRead() was always called with the RecPtr of the record, even when >> we actually read the segment header from the file first. We'll have to >> somehow get that same information, the RecPtr of the record we're actually >> interested in, to XLogPageRead(). We could add a new argument to the >> callback for that, or we could keep xlogreader.c as it is and pass it >> through from ReadRecord to XLogPageRead() in the private struct. >> >> An explicit argument to the callback is probably best. That's >> straightforward, and it might be useful for the callback to know the actual >> WAL position that xlogreader.c is interested in anyway. See attached. >> > Just to let you know that I am still getting the error even after commit > 2ff6555. > With the same scenario: > 1) Start a master with 2 slaves > 2) Kill/Stop slave > 3) Promote slave 1, it switches to timeline 2 > Log on slave 1 > > LOG: selected new timeline ID: 2 > 4) Reconnect slave 2 to save 1, slave 2 remains stuck in timeline 1 even > if recovery_target_timeline is set to latest > Log on slave 1 at this moment: > DEBUG: received replication command: IDENTIFY_SYSTEM > DEBUG: received replication command: TIMELINE_HISTORY 2 > DEBUG: received replication command: START_REPLICATION 0/5000000 TIMELINE > 1 > Slave 1 receives command to start replication with timeline 1, while it is > sync with timeline 2. > Log on slave 2 at this moment: > LOG: restarted WAL streaming at 0/5000000 on timeline 1 > > LOG: replication terminated by primary server > DETAIL: End of WAL reached on timeline 1 at 0/5014200 > DEBUG: walreceiver ended streaming and awaits new instructions > > The timeline history file is the same for both nodes: > $ cat 00000002.history > 1 0/5014200 no recovery target specified > > I might be wrong, but shouldn't there be also an entry for timeline 2 in > this file? > > Am I missing something? > Sorry, there are no problems... I simply forgot to set up recovery_target_timeline to 'latest' in recovery.conf... -- Michael Paquier http://michael.otacoo.com