On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote: > It would be less of a concern yes, but I think it still would be a concern. > If you have a large amount of corruption you could quickly get to millions > of rows to keep track of which would definitely be a problem in shared > memory as well, wouldn't it?
Yes. I have discussed this item with Bertrand off-list and I share the same concern. This would lead to an lot of extra workload on a large seqscan for a corrupted relation when the stats are written (shutdown delay) while bloating shared memory with potentially millions of items even if variable lists are handled through a dshash and DSM. > But perhaps we could keep a list of "the last 100 checksum failures" or > something like that? Applying a threshold is one solution. Now, a second thing I have seen in the past is that some disk partitions were busted but not others, and the current database-level counters are not enough to make a difference when it comes to grab patterns in this area. A list of the last N failures may be able to show some pattern, but that would be like analyzing things with a lot of noise without a clear conclusion. Anyway, the workload caused by the threshold number had better be measured before being decided (large set of relation files with a full range of blocks corrupted, much better if these are in the OS cache when scanned), which does not change the need of a benchmark. What about just adding a counter tracking the number of checksum failures for relfilenodes in a new structure related to them (note that I did not write PgStat_StatTabEntry)? If we do that, it is then possible to cross-check the failures with tablespaces, which would point to disk areas that are more sensitive to corruption. -- Michael
signature.asc
Description: PGP signature