Thanks! This is very helpful. On Wed, Oct 29, 2025 at 8:01 PM David Rowley <[email protected]> wrote:
> On Thu, 30 Oct 2025 at 07:29, Jacob Jackson <[email protected]> > wrote: > > Hello. I was looking at some query plans recently and noticed something > that didn't make sense. I have a query that joins a table of questions with > results for each question (using a table with a composite primary key of > question id and a user id), filtered by user id. The question IDs and the > combined question-userIds are guaranteed unique due to being primary keys, > and yet Postgres still memoizes the inner loop results. Any ideas why? > > I agree that when the outer side of the join has unique values that > Memoize does not make any sense. The planner currenrly puts quite a > bit of faith in the row estimates for this and if you're getting this > plan, then the estimates came back indicating there'd be fewer unique > values of "QuestionUserStatus".question in the input than there are > input rows to the Memoize node. If you delve into > cost_memoize_rescan(), you'll see the code for this (look for where > "hit_ratio" is calculated). > > There are also a few prechecks in get_memoize_path() to try to avoid > this sort of thing, but unfortunately, the information to avoid > Memoize when the outer side of the join is unique isn't available. We > do have an "inner_unique" in JoinPathExtraData, but what we'd need for > this and don't have is "outer_unique". If we had that, we could just > exit early in get_memoize_path() if that's set to true. Whether or not > going to the trouble of calculating "outer_unique" is worth the > trouble, I'm not sure. There was some work on UniqueKeys a few years > ago, which could have helped in this scenario as we could have more > easily identified uniqueness at different join levels. That's no > longer being worked on, as I understand it. > > On the other hand, it may be better to somehow enhance > estimate_num_groups() so it can be given more details about the > context of the request, i.e the set of Relids that are joined already > for the input_rows. That way the code could do more analysis into the > RelOptInfo base quals for the relevant relations. Extended statistics > for n_distinct could also be applied in some cases too by looking for > baserestrictinfo with equality quals or EquivalenceClasses with > ec_has_const = true and a member for other Vars/Exprs in the extended > statistics. > > Unfortunately, neither of these is a trivial fix. > > David >
