Thanks for benchmarking this, and for pinning down the anti-join case. You're right. This is exactly the risk I flagged in the original mail -- that anti joins have no JOIN_UNIQUE-style inner-dedup fallback -- where I said I hadn't yet found a case in which the difference was more than marginal. You've found one. It comes down to an asymmetry between the two join types: the simplification is reversible for a semijoin but not for an antijoin.
v2 therefore restricts it to the semijoin (IN/ANY) path and leaves a null-safe NOT IN -> antijoin completely untouched, so by construction your select count(*) from ao where a not in (select distinct k from ai); plans exactly as on master -- the regression simply cannot arise. I confirmed this directly: on the same data, v2 produces a plan identical to master's for your query (the same parallel Unique over a HashAggregate), so there is no performance change versus master. Only the semijoin cases, where the consistent wins were, are changed from v1. The reason for the asymmetry: a semijoin is algebraically equivalent to an inner join whose inner side has been made unique on the join key, A SEMI B == A JOIN unique(B) and that equivalence is precisely what JOIN_UNIQUE_INNER implements. So dropping the sub-select's DISTINCT in the semijoin case loses nothing: the planner can reconstruct the de-duplication itself (by unique-ifying the inner) whenever that is cheaper, so the removal just turns a forced de-dup into a cost-based one. An antijoin has no comparable equivalence -- "no matching row exists" cannot be rephrased as an inner join over any transform of the inner side -- which is exactly why there is no JOIN_UNIQUE path for it (joinrels.c calls create_unique_paths() only for JOIN_SEMI). So once a DISTINCT in a NOT IN sub-select is dropped, the de-duplication is gone for good, and the planner is forced to push the full, heavily-duplicated inner relation through the join -- which is where the parallel de-dup on master was winning in your test. The further point you raise -- that the serial-vs-parallel choice for this antijoin looks suboptimal in its own right -- seems to me a pre-existing parallel-costing matter independent of this patch, so I haven't touched it here; might be worth a separate look. v2 is attached. make check passes, and I extended the NOT IN coverage in subselect.sql to pin down the new behavior directly: the DISTINCT/ORDER BY spellings now keep their de-duplication/sort on the antijoin path rather than being flattened like the IN spellings. Happy to post the plan trees or before/after numbers for your 2M-row case if that's useful. Regards, Ewan On Thu, Jun 11, 2026 at 6:33 AM Zsolt Parragi <[email protected]> wrote: > > Hello > > I verified that the patch is generally faster in my benchmarks, with > one exception: > anti joins with heavy duplication end up being significantly slower, > for example: > > create table ao (a int not null); > create table ai (k int not null); > insert into ao select g from generate_series(1,100000) g; > insert into ai select g % 50 from generate_series(1,2000000) g; > analyze ao; > analyze ai; > \timing on > explain (analyze, costs off, timing off, summary off) > select count(*) from ao where a not in (select distinct k from ai); > > Which seems related to parallelization, as in these scenarios the > patched version chooses a serial execution compared to the > parallelized deduplication on master, and ends up being 2-4x slower. > If I force it to use parallel workers, it ends up being faster even in > these cases. > >
v2-0001-Discard-ORDER-BY-and-DISTINCT-in-an-ANY-IN-sub-se.patch
Description: Binary data
