It'd not be such an issue if we updated stats during recovery, but I
think think we're doing that. Perhaps we should, which might also help
on replicas - no idea if it's feasible, though.

Stats on replicas are considered an independent thing AFAIU (scans are
counted for example in read-only queries).  If we were to do that we
may want to split stats handling between nodes in standby state and
crash recovery.  Not sure if that's worth the complication.  First,
the stats exist at node-level.

Hmm, I'm a bit disappointed this doesn't address replication. It makes sense that scans are counted separately on a standby, but it would be nice if stats like last_vacuum were propagated from primary to standbys. I guess that can be handled separately later.


Reviewing v7-0001-Flush-pgstats-file-during-checkpoints.patch:

There are various race conditions where a stats entry can be leaked in the pgstats file. I.e. relation is dropped, but its stats entry is retained in the stats file after crash. In the worst case, suck leaked entries can accumulate until the stats file is manually removed, which resets all stats again. Perhaps that's acceptable - it's still possible leak the actual relation file for a new relation on crash, after all, which is much worse (I'm glad Horiguchi-san is working on that [1]).

For example:
1. BEGIN; CREATE TABLE foo (); ANALYZE foo;
2. CHECKPOINT;
3. pg_ctl restart -m immediate

This is the same scenario where we leak the relfile, but now you can have it with e.g. function statistics too.

Until 5891c7a8ed, there was a mechanism to garbage collect such orphaned entries (pgstat_vacuum()). How bad would it be to re-introduce that? Or can we make it more watertight so that there are no leaks?


If you do this:

pg_ctl start -D data
pg_ctl stop -D data -m immediate
pg_ctl start -D data
pg_ctl stop -D data -m immediate

You get this in the log:

2024-09-02 16:28:37.874 EEST [1397281] WARNING: found incorrect redo LSN 0/160A3C8 (expected 0/160A440)

I think it's failing to flush the stats file at the end of recovery checkpoint.


[1] https://www.postgresql.org/message-id/20240901.010925.656452225144636594.horikyota.ntt%40gmail.com

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to