Re: Flush pgstats file during checkpoints

Heikki Linnakangas Mon, 02 Sep 2024 07:08:19 -0700

It'd not be such an issue if we updated stats during recovery, but I
think think we're doing that. Perhaps we should, which might also help
on replicas - no idea if it's feasible, though.


Stats on replicas are considered an independent thing AFAIU (scans are
counted for example in read-only queries).  If we were to do that we
may want to split stats handling between nodes in standby state and
crash recovery.  Not sure if that's worth the complication.  First,
the stats exist at node-level.

Hmm, I'm a bit disappointed this doesn't address replication. It makessense that scans are counted separately on a standby, but it would benice if stats like last_vacuum were propagated from primary to standbys.I guess that can be handled separately later.



Reviewing v7-0001-Flush-pgstats-file-during-checkpoints.patch:

There are various race conditions where a stats entry can be leaked inthe pgstats file. I.e. relation is dropped, but its stats entry isretained in the stats file after crash. In the worst case, suck leakedentries can accumulate until the stats file is manually removed, whichresets all stats again. Perhaps that's acceptable - it's still possibleleak the actual relation file for a new relation on crash, after all,which is much worse (I'm glad Horiguchi-san is working on that [1]).


For example:
1. BEGIN; CREATE TABLE foo (); ANALYZE foo;
2. CHECKPOINT;
3. pg_ctl restart -m immediate

This is the same scenario where we leak the relfile, but now you canhave it with e.g. function statistics too.

Until 5891c7a8ed, there was a mechanism to garbage collect such orphanedentries (pgstat_vacuum()). How bad would it be to re-introduce that? Orcan we make it more watertight so that there are no leaks?



If you do this:

pg_ctl start -D data
pg_ctl stop -D data -m immediate
pg_ctl start -D data
pg_ctl stop -D data -m immediate

You get this in the log:

2024-09-02 16:28:37.874 EEST [1397281] WARNING: found incorrect redoLSN 0/160A3C8 (expected 0/160A440)

I think it's failing to flush the stats file at the end of recoverycheckpoint.

[1]https://www.postgresql.org/message-id/20240901.010925.656452225144636594.horikyota.ntt%40gmail.com


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Flush pgstats file during checkpoints

Reply via email to