[...]
I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.
They are a great evidence that your data is corrupt. You *want* to know
that your data is corrupt. Even if our best recommendation is "go restore
your backups", you still want to know. Otherwise you are sitting around on
data that's corrupt and you don't know about it.
There are certainly many things we can do to improve the experience. But
not telling people their data is coorrupt when it is, isn't one of them.
Yep, anyone should want to know if their database is corrupt, compare to
ignoring the fact.
One reason not to enable it could be if the implementation is not trusted,
i.e. if false positive (corrupt page detected while the data are okay and
there was only an issue with computing or storing the checksum) can occur.
There is also the performance impact. I did some quick-and-dirty pgbench
simple update single thread performance tests to compare with vs without
checksum. Enabling checksums on these tests seems to induce a 1.4%
performance penalty, although I'm moderately confident about it given the
standard deviation. At least it is an indication, and it seems to me that
it is consistent with other figures previously reported on the list.
--
Fabien.