On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncure <mmonc...@gmail.com> wrote: > Still getting checksum failures. Over the last 30 days, I see the > following. Since enabling checksums FWICT none of the damage is > permanent and rolls back with the transaction. So creepy!
The checksums still only differ in least significant digits which pretty much means that there is a block number mismatch. So if you rule out filesystem not doing its job correctly and transposing blocks, it could be something else that is resulting in blocks getting read from a location that happens to differ by a small multiple of page size. Maybe somebody is racily mucking with table fd's between seeking and reading. That would explain the issue disappearing after a retry. Maybe you can arrange for the RelFileNode and block number to be logged for the checksum failures and check what the actual checksums are in data files surrounding the failed page. If the requested block number contains something completely else, but the page that follows contains the expected checksum value, then it would support this theory. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers