Thank you Alvaro for the comment (on my comment). At Fri, 13 Dec 2019 18:33:44 -0300, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote in > On 2019-Dec-13, Kyotaro Horiguchi wrote: > > > At Thu, 12 Dec 2019 22:50:20 +0000, "Bossart, Nathan" <bossa...@amazon.com> > > wrote in > > > > The crux of the issue seems to be that XLogWrite() does not wait for > > > the entire record to be written to disk before creating the ".ready" > > > file. Instead, it just waits for the last page of the segment to be > > > written before notifying the archiver. If PostgreSQL crashes before > > > it is able to write the rest of the record, it will end up reusing the > > > ".ready" segment at the end of crash recovery. In the meantime, the > > > archiver process may have already processed the old version of the > > > segment. > > > > Year, that can happen if the server restarted after the crash. > > ... which is the normal way to run things, no?
Yes. In older version (< 10), the default value for wal_level was minimal. In 10, the default only for wal_level was changed to replica. Still I'm not sure if restart_after_crash can be recommended for streaming replcation... > Why is it bad? It's the default value. I reconsider it more deeply. And concluded that's not harm replication as I thought. WAL-buffer overflow may write partial continuation record and it can be flushed immediately. That made me misunderstood that standby can receive only the first half of a continuation record. Actually, that write doesn't advance LogwrtResult.Flush. So standby doesn't receive a split record on page boundary. (The cases where crashed mater is used as new standby as-is might contaminate my thought..) Sorry for the bogus comment. My conclusion here is that restart_after_crash doesn't seem to harm standby immediately. > > The standby can be incosistent at the time of master crash, so it > > should be fixed using pg_rewind or should be recreated from a base > > backup. > > Surely the master will just come up and replay its WAL, and there should > be no inconsistency. > > You seem to be thinking that a standby is promoted immediately on crash > of the master, but this is not a given. Basically no, but it might be mixed a bit. Anyway returning to the porposal, I think that XLogWrite can be called during at WAL-buffer-full and it can go into the last page in a segment. The proposed patch doesn't work since the XLogWrite call didn't write the whole continuation record. But I'm not sure that corner-case is worth amendint.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center