On Wed, Sep 27, 2023 at 11:06:37AM +1300, Thomas Munro wrote: > On Tue, Sep 26, 2023 at 8:38 PM Michael Paquier <mich...@paquier.xyz> wrote: > > Thoughts and/or comments are welcome. > > I don't have an opinion yet on your other thread about making this > stuff configurable for replicas, but for the simple crash recovery > case shown here, hard failure makes sense to me.
> Recycled pages can't fool us into making a huge allocation any more. > If xl_tot_len implies more than one page but the next page's > xlp_pageaddr is too low, then either the xl_tot_len you read was > recycled garbage bits, or it was legitimate but the overwrite of the > following page didn't make it to disk; either way, we don't have a > record, so we have an end-of-wal condition. The xlp_rem_len check > defends against the second page making it to disk while the first one > still contains recycled garbage where the xl_tot_len should be*. > > What Michael wants to do now is remove the 2004-era assumption that > malloc failure implies bogus data. It must be pretty unlikely in a 64 > bit world with overcommitted virtual memory, but a legitimate > xl_tot_len can falsely end recovery and lose data, as reported from a > production case analysed by his colleagues. In other words, we can > actually distinguish between lack of resources and recycled bogus > data, so why treat them the same? Indeed. Hard failure is fine, and ENOMEM=end-of-WAL definitely isn't. > *A more detailed analysis would talk about sectors (page header is > atomic) I think the page header is atomic on POSIX-compliant filesystems but not atomic on ext4. That doesn't change the conclusion on $SUBJECT.