On Thu, Jan 17, 2013 at 1:08 AM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > On 15.01.2013 20:22, Fujii Masao wrote: >> >> On Tue, Jan 15, 2013 at 11:05 PM, Heikki Linnakangas >> <hlinnakan...@vmware.com> wrote: >>> >>> Now that a standby server can follow timeline switches through streaming >>> replication, we should do teach pg_receivexlog to do the same. Patch >>> attached. >>> >>> I made one change to the way START_STREAMING command works, to better >>> support this. When a standby server reaches the timeline it's streaming >>> from >>> the master, it stops streaming, fetches any missing timeline history >>> files, >>> and parses the history file of the latest timeline to figure out where to >>> continue. However, I don't want to parse timeline history files in >>> pg_receivexlog. Better to keep it simple. So instead, I modified the >>> server-side code for START_STREAMING to return the next timeline's ID at >>> the >>> end, and used that in pg_receivexlog. I also modifed BASE_BACKUP to >>> return >>> not only the start XLogRecPtr, but also the corresponding timeline ID. >>> Otherwise we might try to start streaming from wrong timeline if you >>> issue a >>> BASE_BACKUP at the same moment the server switches to a new timeline. >>> >>> When pg_receivexlog switches timeline, what to do with the partial file >>> on >>> the old timeline? When the timeline changes in the middle of a WAL >>> segment, >>> the segment old the old timeline is only half-filled. For example, when >>> timeline changes from 1 to 2, you'll have this in pg_xlog: >>> >>> 000000010000000000000006 >>> 000000010000000000000007 >>> 000000010000000000000008 >>> 000000020000000000000008 >>> 00000002.history >>> >>> The segment 000000010000000000000008 is only half-filled, as the timeline >>> changed in the middle of that segment. The beginning portion of that file >>> is >>> duplicated in 000000020000000000000008, with the timeline-changing >>> checkpoint record right after the duplicated portion. >>> >>> When we stream that with pg_receivexlog, and hit the timeline switch, >>> we'll >>> have this situation in the client: >>> >>> 000000010000000000000006 >>> 000000010000000000000007 >>> 000000010000000000000008.partial >>> >>> What to do with the partial file? One option is to rename it to >>> 000000010000000000000008. However, if you then kill pg_receivexlog before >>> it >>> has finished streaming a full segment from the new timeline, on restart >>> it >>> will try to begin streaming WAL segment 000000010000000000000009, because >>> it >>> sees that segment 000000010000000000000008 is already completed. That'd >>> be >>> wrong. >> >> >> Can't we rename .partial file safely after we receive a full segment >> of the WAL file >> with new timeline and the same logid/segmentid? > > > I'd prefer to leave the .partial suffix in place, as the segment really > isn't complete. It doesn't make a difference when you recover to the latest > timeline, but if you have a more complicated scenario with multiple > timelines that are still "alive", ie. there's a server still actively > generating WAL on that timeline, you'll easily get confused. > > As an example, imagine that you have a master server, and one standby. You > maintain a WAL archive for backup purposes with pg_receivexlog, connected to > the standby. Now, for some reason, you get a split-brain situation and the > standby server is promoted with new timeline 2, while the real master is > still running. The DBA notices the problem, and kills the standby and > pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in > pg_receivexlog's target directory, and re-points pg_recevexlog to the master > while he re-builds the standby server from backup. At that point, > pg_receivexlog will start streaming from the end of the zero-padded segment, > not knowing that it was partial, and you have a hole in the archived WAL > stream. Oops. > > The DBA could avoid that by also removing the last WAL segment on timeline > 1, the one that was partial. But it's really not obvious that there's > anything wrong with that segment. Keeping the .partial suffix makes it > clear.
Thanks for elaborating the reason why .partial suffix should be kept. I agree that keeping the .partial suffix would be safer. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers