On 2019-03-22 18:01:32 +0100, Tomas Vondra wrote: > On 3/22/19 5:41 PM, Andres Freund wrote: > > Hi, > > > > On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote: > >> On 3/22/19 5:10 PM, Andres Freund wrote: > >>> IDK, being able to verify in some form that backups aren't corrupted on > >>> an IO level is mighty nice. That often does allow to detect the issue > >>> while one still has older backups around. > >>> > >> > >> Yeah, I agree that's a valuable capability. I think the question is how > >> effective it actually is considering how much the storage changed over > >> the past few years (which necessarily affects the type of failures > >> people have to deal with). > > > > I'm not sure I understand? How do the changes around storage > > meaningfully affect the need to have some trust in backups and > > benefiting from earlier detection? > > > > Having trusted in backups is still desirable - nothing changes that, > obviously. The question I was posing was rather "Are checksums still > effective on current storage systems?" > > I'm wondering if the storage systems people use nowadays may be failing > in ways that are not reliably detectable by checksums. I don't have any > data to either support or reject that hypothesis, though.
I don't think it's useful to paint unsubstantiated doom-and-gloom pictures. > >> It's not clear to me what can checksums do about zeroed pages (and/or > >> truncated files) though. > > > > Well, there's nothing fundamental about needing added pages be > > zeroes. We could expand them to be initialized with actual valid > > checksums instead of > > /* new buffers are zero-filled */ > > MemSet((char *) bufBlock, 0, BLCKSZ); > > /* don't set checksum for all-zero page */ > > smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false); > > > > the problem is that it's hard to do so safely without adding a lot of > > additional WAL logging. A lot of filesystems will journal metadata > > changes (like the size of the file), but not contents. So after a crash > > the tail end might appear zeroed out, even if we never wrote > > zeroes. That's obviously solvable by WAL logging, but that's not cheap. > > > > Hmmm. I'd say a filesystem that does not guarantee having all the data > after an fsync is outright broken, but maybe that's what checksums are > meant to protect against. There's no fsync here. smgrextend(with-valid-checksum);crash; - the OS will probably have journalled the file size change, but not the contents. After a crash it's thus likely that the data page will appear zeroed. Which prevents us from erroring out when encountering a zeroed page, even though that'd be very good for error detection capabilities, because storage systems will show corrupted data as zeroes in a number of cases. Greetings, Andres Freund