On Wed, Mar 27, 2024 at 10:33:28AM -0400, Robert Haas wrote: > FWIW, I thought the time-based one sounded more useful. I think it > would be poor planning to say "well, if the slot reaches an XID age of > a billion, kill it so we don't wrap around," because while that likely > will prevent me from getting into wraparound trouble, my database is > likely to become horribly bloated long before the cutoff is reached. I > thought it would be easier to reason in terms of time: I don't expect > a slave to ever be down for more than X period of time, say an hour or > whatever, so if it is, forget about it. Or alternatively, I know that > if a slave does go down for more than X period of time, I start to get > bloat, so cut it off at that point and I'll rebuild it later. I feel > like these are things where people's intuition is going to be much > stronger when reckoning in units of wall-clock time, which everyone > deals with every day in one way or another, rather than in XID-based > units that are, at least in my view, just a lot less intuitive.
I don't disagree with this point in the context of a user who is managing a single server or just a handful of servers. They are going to understand their workload best and can reason about the right value for the timeout. I think they'd still benefit from having an XID-based setting as a backstop in case the timeout is still not sufficient to prevent wraparound, but it's not nearly as important in that case. IMHO the use-case where this doesn't work so well is when you have many, many servers to administer (e.g., a cloud provider). In those cases, picking a default timeout to try to prevent wraparound is going to be much less accurate, as any reasonable value you pick is still going to be insufficient in some cases. I think the XID-based parameter would be better here; if the server is at imminent risk of an outage due to wraparound, invalidating the slots is probably a reasonable course of action. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com