Robert Haas <robertmh...@gmail.com> writes: > Well ... I agree that brittle is bad, but it's not clear to me which > way is actually less brittle. As for performant, I think you might be > misjudging the situation. My original design for removing SnapshotNow > didn't even have the catalog snapshot - it just took a new snapshot > every time. That was mostly fine, but Andres found a somewhat extreme > test case where it exhibited a significant regression, so I added the > catalog snapshot stuff to work around that. So I'm not AT ALL > convinced that giving catalog snapshots longer lifetimes is a relevant > thing to do.
Perhaps not. But right now we have to first think about correctness and how much trouble it will be to get to correctness. ISTM you are arguing for a design in which it's required that there is always a registered or active snapshot that's older than the catalog snapshot (if any). I tried revising snapmgr.c to enforce that, starting with adding @@ -421,6 +421,13 @@ GetNonHistoricCatalogSnapshot(Oid relid) if (CatalogSnapshot == NULL) { + /* + * The catalog snapshot must always be newer than some active or + * registered snapshot. (XXX explain why) + */ + Assert(ActiveSnapshot != NULL || + !pairingheap_is_empty(&RegisteredSnapshots)); + /* Get new snapshot. */ CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData); and this blew up with truly impressive thoroughness. The autovac launcher, logical replication launcher, and incoming backends all fail this assertion instantly, making it impossible to find out what else might be broken --- but I'm sure there is a lot. (If you want to try this for yourself, remember that the postmaster will relaunch the AV and LR launchers on failure, meaning that your disk will fill with core files very very quickly. Just sayin'.) So maybe we can go that direction, but it's going to require a lot of code additions to push extra snapshots in places that haven't bothered to date; and I'm not convinced that that'd be buying us anything except more GetSnapshotData() calls. Plan B is to grant catalog snapshots some more-durable status than what Plan A envisions. I'm not very sure about the details, but if you don't want to go that route then you'd better set about making the above assertion work. In the meantime, since it's clear that HaveRegisteredOrActiveSnapshot is failing to meet its contract, I'm going to go fix it. I think (based on the above argument) that what it intends to enforce is not really the system design we need, but it certainly isn't helping anyone that it enforces that design incorrectly. regards, tom lane