On Fri, Jan 26, 2024 at 2:32 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> I'm fairly sure I thought it wouldn't matter because of the Param > de-duplication done in paramassign.c. However, Richard's example > shows that's not so, because process_subquery_nestloop_params is > picky about the param ID assigned to a particular Var while > replace_nestloop_params is not. So flipping the order makes sense. > I'd change the comment though, maybe to > > /* > * Replace any outer-relation variables with nestloop params. > * > * We must provide nestloop params for both lateral references of > * the subquery and outer vars in the scan_clauses. It's better > * to assign the former first, because that code path requires > * specific param IDs, while replace_nestloop_params can adapt > * to the IDs assigned by process_subquery_nestloop_params. > * This avoids possibly duplicating nestloop params when the > * same Var is needed for both reasons. > */ +1. It's much better. > However ... it seems like we're not out of the woods yet. Why > is Richard's proposed test case still showing > > + -> Memoize (actual rows=5000 loops=N) > + Cache Key: t1.two, t1.two > > Seems like there is missing de-duplication logic, or something. When we collect the cache keys in paraminfo_get_equal_hashops() we search param_info's ppi_clauses as well as innerrel's lateral_vars for outer expressions. We do not perform de-duplication on the collected outer expressions there. In my proposed test case, the same Var 't1.two' appears both in the param_info->ppi_clauses and in the innerrel->lateral_vars, so we see two identical cache keys in the plan. I noticed this before and wondered if we should do de-duplication on the cache keys, but somehow I did not chase this to the ground. Thanks Richard