On Thu, Aug 25, 2016 at 10:33 PM, Tsunakawa, Takayuki <tsunakawa.ta...@jp.fujitsu.com> wrote: > The processing went as follows. > > 1. node1's timeline is 140. It wrote a WAL record at the end of WAL segment > 117. The WAL record didn't fit the last page, so it was split across > segments 117 and 118. > > 2. WAL segment 117 was archived. > > 3. node1 got down, and node2 was promoted. > > 4. As part of the recovery process, node2 retrieves WAL segment 117 from > archive. It found a WAL record fragment at the end of the segment but could > not find the remaining fragment in segment 118, so node2 stops recovery there. > > LOG: restored log file "0000008C0000028C00000075" from archive > LOG: received promote request > LOG: redo done at 28C/75FFF738 > > 5. node2 becomes the primary, and its timeline becomes 118. node3 is > disconnected by node2 (but later reconnectes to node2). > > LOG: terminating all walsender processes to force cascaded standby(s) to > update timeline and reconnect > > 6. node3 retrieves and applies WAL segment 117 from archive. > > LOG: restored log file "0000008C0000028C00000075" from archive > > 7. node3 found .history file for time line 141 and renews its timeline to 141. > > 8. Because node3 found a WAL record fragment at the end of segment 117, it > expects to find the remaining fragment at the beginning of WAL segment 118 > streamed from node2. But there was a fragment of a different WAL record, > because node2 overwrote a different WAL record at the end of segment 117 > across to 118. > > LOG: invalid contrecord length 5892 in log file 652, segment 118, offset 0 > > 9. node3 then retrieves segment 117 from archive again to get the WAL record > at the end of segment 117. However, as node3's timeline is already 141, it > complains about the older timeline when it sees the timeline 140 at the > beginning of segment 117. > > LOG: out-of-sequence timeline ID 140 (after 141) in log file 652, segment > 117, offset 0
OK. I agree that's a problem. However, your patch adds zero new comment text while removing some existing comments, so I can't easily tell how it solves that problem or whether it does so correctly. Even if I were smart enough to figure it out, I wouldn't want to rely on the next person also being that smart. This is obviously a subtle problem in tricky code, so a clear explanation of the fix seems like a very good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers