On Thu, Apr 2, 2026 at 10:08 PM Robert Haas <[email protected]> wrote: > I'm not sure, either, and I agree that we have a problem. I'll give it > some more thought tomorrow.
OK, here are my thoughts. I don't believe it's viable to change pg_plan_advice in such a way that it won't run into trouble in cases like this. Somebody could argue that the choice of INDEX_SCAN(table_name index_name) was bad design, and that I should have done something like INDEX_SCAN(table_name indexed_columns) instead, and that might be true. There might also be an argument that we should have both things with different spellings, and that might also very well be true. But we don't really know that changing that design decision would fully stabilize test_plan_advice. The regression tests can do anything they like, as long as they reliably pass. It now seems optimistic to me to suppose that an index with a different name is the only current or future issue we'll ever have. I mean, if the table were small enough not to care about whether an index scan or a sequential scan is used, you could concurrently drop the one and only index altogether, and what's test_plan_advice supposed to do about that? So, I argue that there are three possible categories of solutions here: (1) don't let the problem cases happen in the first place, (2) detect that a problem has happened, or (3) give up on test_plan_advice. In category (1), the simplest idea would be (1a) to run the tests serially. That would probably involve running them much less often, like in one of the CI builds but not all of them or something like that. Another idea that I had is to (1b) try to take stronger locks on the relations involved to prevent concurrent DDL on them, like a ShareUpdateExclusiveLock, or (1c) some kind of bespoke interlock specific to test_plan_advice. I think that might cause random breakage of other types, though. Another idea in this category is to try to make the main regression tests "pg_plan_advice clean". I know Tom already expressed opposition to that idea, but here me out: we could (1d) have a separate test suite that still does stuff like this, so we don't lose test coverage, and move some stuff there. Or, instead of completely separating it, we could (1e) have two schedule files, one of which includes all the tests and a second of which includes only the tests that are test_plan_advice-clean. Although my theory that the main regression tests couldn't have multiple different sessions simultaneously doing DDL on the same objects has been proven wrong, I'd still be willing to bet that it's a minority position. Of course, as Tom pointed out, there could be a "long tail" of failures here, but maybe we could create some throwaway infrastructure to help figure that out. For example if we're mostly worried about tables, we could have each backend accumulate a list of table OIDs that it touched and spit that out into the log file when it exits. That wouldn't be committable code, but it would be enough to let us run the regression tests with that once and see what overlaps exist. I bet there's very low risk of newly-added tests adding more such cases: the ones that we have are probably ancient. Of course, maybe I'm wrong about that, too, but it's a theory that we can discuss. In category (2), what if, (2a) whenever we see advice feedback that we'd otherwise print, we try replanning the query a THIRD time without any supplied advice? If we generate different advice than we did the first time we planned it, then we know for sure that something is unstable, and we can decide not to complain about whatever went wrong. This isn't completely guaranteed to work, though: what if concurrent DDL changes something between planning cycle 1 and planning cycle 2 and then changes it back before planning cycle 3? But maybe it would be acceptable to make a rule that the main regression test case shouldn't do that, and adjust cases that currently do to work otherwise. If we're not willing to make any rules at all to prevent the main regression test suite from sabotaging test_plan_advice, then it's probably doomed. And, I think there's a reasonable argument that insisting that the main regression test suite absolutely has to change the definition of an object in a way that test_plan_advice will care about and then change it back to exactly the initial state while in a concurrent session some other backend is running queries against that object is tantamount to legislating deliberate sabotage. But that said, this proposal has some other imperfections as well. In particular, a bug that caused the third planning cycle to always produce different results than the first would hide all future problems that test_plan_advice might have caught, which is pretty sad. Another variant of the same basic idea is to (2b) just detect when we've seen any shared invalidations between the start of the first planning cycle and the end of the second, and go "never mind, don't complain even if we saw a problem". The problem with this idea is that, as in the previous proposal, it might make the tests too insensitive to real issues. But I wonder if this might be fixable. Maybe we could (2c) make test_plan_advice take planner_hook and wrap a loop around the problem: it just keeps replanning the query via standard_planner (which would eventually reach test_plan_advice_advisor) until no sinval messages are absorbed between the start and end of planner, which I think we could detect using SharedInvalidMessageCounter, or until some retry limit is exhausted and we error out. I'd need to try this and see how well it works out in practice, and how often the retry is actually hit, but it seems like it might be somewhat viable. In category (3), the most blunt option is obviously just (3a) throw test_plan_advice away, which I think is probably dooming pg_plan_advice to getting silently broken in the future. I don't really have any other ideas in this category except for (1a) already mentioned, which is sort of a hybrid solution. My current thought is to do some research into (1e) and (2c). Specifically, for (1e), I want to try to figure out if this is the only case of this type or if there are lots of others, since that seems likely to have a pretty large bearing on what is realistic here. And for (2c), I think I just want to try it out and see if it seems at all feasible. Probably obviously, this is not going to happen before next week, but I hope that the frequency of buildfarm failures is now low enough that this isn't a critical issue. If that's wrong, let me know, but from my point of view, even if we eventually chose (3a), having a good a sense as possible of what the potential failure modes are here would help to design the next solution, and AFAIK this is the first failure we've seen since the DO_NOT_SCAN stuff went in. (In fact, I had a little bit of trouble finding this in the BF results even knowing it was there: filtering by test_plan_advice failures doesn't find anything recent. sifaka's failure shows up as TestModulesCheck-en_US.UTF-8, but frustratingly, the names for the stage logs don't seem to quite match the name of what failed. There is testmodules-install-check-C and testmodules-install-check-en_US.UTF-8, but those have "install" in the name and are punctuated differently, so it's not instantly clear that it's the same thing. Anyway, I do see it in there now, but what I'm saying is that if there have been other failures that are related to this, it's possible I have missed them due to stuff like this, so it's helpful that you (Tom) pointed this one out.) Tom, would welcome your thoughts, if you have any, and anyone else's thoughts as well. If none, I'll proceed as described above and update when I know more. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
