Re: pg_plan_advice

Robert Haas Thu, 26 Feb 2026 05:56:29 -0800

Thanks, Alex, for the review.

On Mon, Feb 23, 2026 at 8:10 PM Alexandra Wang
<[email protected]> wrote:
> While reading the patch, I noticed several places where we infer
> planner intent from currently observed plan shapes (e.g., Gather
> projection, Agg used for semijoin uniqueness, Append elision, etc.),
> which I quote below:
[....]
> All of the interpretations above look reasonable, but they made me
> wonder how modules like this detect planner-behavior drift over time
> if we add new plan node types, add or change plan shapes, or even a
> new debug GUC.


In my personal opinion, this is one of the really key issues for this
patch and deserves more discussion, but it feels like it's been very
difficult to get any senior hackers to pay attention to this patch.
I'm not sure whether that's because it's a huge patch, or they just
don't want to touch it with a ten foot pole, or they're just busy with
their own work, but it's a little hard as the patch author to know how
reasonable the decisions you've made are until you hear what other
people think of them.

In terms of keeping things working, I think that test_plan_advice has
a pretty good chance of catching future breakage. If somebody adds a
new planner optimization that breaks the plan tree walk that
pg_plan_advice does, they should also be adding test cases for it. If
test_plan_advice can't generate plan advice any more after that
change, the patch will cause a test failure there, or if it can
generate plan advice but that plan advice doesn't properly apply to
planning, again a test failure will result. But as to whether
test_plan_advice is a good enough test or whether more things need to
be added for people to feel comfortable, well, that's a question.
Also, there's the issue that even if test_plan_advice is good enough,
or can be made so, that still potentially burdens future patch authors
with the task of updating pg_plan_advice for whatever they do. That's
not unique to this patch, of course; any patch that expands the
capabilities of PostgreSQL in a new direction has that issue to some
degree. To take a few examples from my own past work, consider wait
events or parallel query. But there's still a judgement call to made
about whether the patch adds too much burden for what we get out of
it. Personally, I think that some kind of user control over the
planner is one of the really big things that experienced PostgreSQL
users want, and therefore I'm inclined to believe it's worth it. But
most people are in favor of their own patches, so that's not really
surprising.

> I realize this isn't specific to this patch, but I'm curious whether
> something like monitoring a possible set of parent-child plan node
> patterns has ever been considered, to make structural changes easier
> to notice when those invariants stop holding. I don't think this is
> actionable for this patch, but I'm just recording the thought here
> while reading.

One thing I've been thinking about is that we might want to consider
annotating some planner nodes with a little more information so that
pg_plan_advice doesn't have to infer quite as much. For example, if we
annotated Agg nodes with their purpose (regular agg, eager agg,
partitionwise agg, semijoin uniqueness) and their relid set,
pg_plan_advice would have an easier time figuring out what's going on
and could throw errors if unexpected things happen instead of
delivering wrong results, possibly silently. I have avoided doing that
kind of thing so far because I wanted to keep the core changes as
small as possible, but there's an argument that I've taken too much
risk there. One of the other things that bugs me about this patch is
that if it turns out that there are things in the advice generation
machinery that are not fully reliable, it may not be possible to fix
those without an ABI break. That is, we can fix any problem we might
have by having the core planner publish more information, but we've
been trying really hard lately to avoid changing structure definitions
or function signatures in minor releases, and I can imagine that it
wouldn't be that hard for pg_plan_advice to be shown to have some
problem that can't be fixed any other way. So what happens if I commit
this and then such a bug shows up? Do we compromise on ABI stability,
or do we just leave the bug unfixed for the life time of that release
and fix it in the next major release, or what? I suppose it's hard to
know in the abstract, but this is another area where getting some
opinions from other committers would be awfully useful.

> I also noticed that several tests in pg_plan_advice are not listed in
> the Makefile. Specifically, they are semijoin.sql, syntax.sql,
> prepared.sql, and local_collector.sql. I ran them manually, and they
> all passed.

Yeah, thanks, I actually also found this myself while doing some
refactoring, shortly before you sent this email. Will fix.

> In pgpa_walk_recursively():
> > /*
> >  * Check the future_query_features list to see whether this was previously
> >  * identified as a plan node that needs to be treated as a query feature.
> >  * We must do this before handling elided nodes, because if there's an
> >  * elided node associated with a future query feature, the RTIs associated
> >  * with the elided node should be the only ones attributed to the query
> >  * feature.
> >  */
> > foreach_ptr(pgpa_query_feature, qf, walker->future_query_features)
> > {
> > if (qf->plan == plan)
> > {
> > active_query_features = list_copy(active_query_features);
> > active_query_features = lappend(active_query_features, qf);
> > walker->future_query_features =
> > list_delete_ptr(walker->future_query_features, plan);
> > break;
> > }
> > }
>
> Should this be deleting "qf" instead of "plan"? Right now the list is
> not emptied after the plan tree walk as intended.

Yeah, I think you're right. I don't think that mistake breaks
anything, it just means the list grows needlessly long, but I'll fix
it.

> In pgpa_decompose_join():
> > if (found_any_outer_gather &&
> > ...
> > if (found_any_inner_gather &&
>
> "found_any_{outer,inner}_gather" should be dereferenced.

Yikes, good catch.

> In pgpa_trim_shared_advice():
> > /* Don't leave stale pointers around. */
> > memset(&chunk_array[remaining_chunk_count], 0,
> >   sizeof(pgpa_shared_advice_chunk *)
> >   * (total_chunk_count - remaining_chunk_count));
>
> Should the element size be "sizeof(dsa_pointer)", consistent with the
> preceding memmove()?

Yes, thanks.

> In pg_plan_advice_advice_check_hook():
> > if (error != NULL)
> > {
> > GUC_check_errdetail("Could not parse advice: %s", error); //
> > return false;
> > }
> >
> > MemoryContextSwitchTo(oldcontext);
> > MemoryContextDelete(tmpcontext);
> >
> > return true;
>
> If an error occurs, do we leak the memory context? Should we also
> switch and delete memory context before returning false?

I think this should be mostly harmless. The parent context, at least
in most cases, shouldn't be anything that long-lived, and when it's
destroyed, it will take this context with it. I think there are some
cases where the parent context can be longer-lived, but they shouldn't
happen often enough for this to become a significant issue. On the
other hand, there's also no reason that I can see not to tighten this
up. I changed it like this:

    if (error != NULL)
        GUC_check_errdetail("Could not parse advice: %s", error);

    MemoryContextSwitchTo(oldcontext);
    MemoryContextDelete(tmpcontext);

    return (error == NULL);

> In pg_get_collected_local_advice(), we skip rows when:
>
> > if (!member_can_set_role(userid, ca->userid))
> > continue;
>
> Do we need to do the same check for pg_get_collected_shared_advice()?

Hmm, yeah, probably a good idea.

> In pgpa_planner_append_feedback(), "StringInfoData buf;" is unused.
...
> The "ExplainState *explain_state" field and the "MemoryContext
> trove_cxt" field in "struct pgpa_planner_state" are both unused.

Thanks, deleted this things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Reply via email to