Thanks for benchmarking this, and for pinning down the anti-join case.

You're right.  This is exactly the risk I flagged in the original mail --
that anti joins have no JOIN_UNIQUE-style inner-dedup fallback -- where I
said I hadn't yet found a case in which the difference was more than
marginal.  You've found one.  It comes down to an asymmetry between the
two join types: the simplification is reversible for a semijoin but not
for an antijoin.

v2 therefore restricts it to the semijoin (IN/ANY) path and leaves a
null-safe NOT IN -> antijoin completely untouched, so by construction
your

select count(*) from ao where a not in (select distinct k from ai);
plans exactly as on master -- the regression simply cannot arise.  I
confirmed this directly: on the same data, v2 produces a plan identical
to master's for your query (the same parallel Unique over a
HashAggregate), so there is no performance change versus master.  Only
the semijoin cases, where the consistent wins were, are changed from v1.

The reason for the asymmetry: a semijoin is algebraically equivalent to
an inner join whose inner side has been made unique on the join key,

A SEMI B  ==  A JOIN unique(B)
and that equivalence is precisely what JOIN_UNIQUE_INNER implements.  So
dropping the sub-select's DISTINCT in the semijoin case loses nothing:
the planner can reconstruct the de-duplication itself (by unique-ifying
the inner) whenever that is cheaper, so the removal just turns a forced
de-dup into a cost-based one.

An antijoin has no comparable equivalence -- "no matching row exists"
cannot be rephrased as an inner join over any transform of the inner
side -- which is exactly why there is no JOIN_UNIQUE path for it
(joinrels.c calls create_unique_paths() only for JOIN_SEMI).  So once a
DISTINCT in a NOT IN sub-select is dropped, the de-duplication is gone
for good, and the planner is forced to push the full, heavily-duplicated
inner relation through the join -- which is where the parallel de-dup on
master was winning in your test.

The further point you raise -- that the serial-vs-parallel choice for
this antijoin looks suboptimal in its own right -- seems to me a
pre-existing parallel-costing matter independent of this patch, so I
haven't touched it here; might be worth a separate look.

v2 is attached.  make check passes, and I extended the NOT IN coverage in
subselect.sql to pin down the new behavior directly: the DISTINCT/ORDER BY
spellings now keep their de-duplication/sort on the antijoin path rather
than being flattened like the IN spellings.

Happy to post the plan trees or before/after numbers for your 2M-row case
if that's useful.

Regards,
Ewan

On Thu, Jun 11, 2026 at 6:33 AM Zsolt Parragi <[email protected]> wrote:
>
> Hello
>
> I verified that the patch is generally faster in my benchmarks, with
> one exception:
> anti joins with heavy duplication end up being significantly slower,
> for example:
>
> create table ao (a int not null);
> create table ai (k int not null);
> insert into ao select g from generate_series(1,100000) g;
> insert into ai select g % 50 from generate_series(1,2000000) g;
> analyze ao;
> analyze ai;
> \timing on
> explain (analyze, costs off, timing off, summary off)
> select count(*) from ao where a not in (select distinct k from ai);
>
> Which seems related to parallelization, as in these scenarios the
> patched version chooses a serial execution compared to the
> parallelized deduplication on master, and ends up being 2-4x slower.
> If I force it to use parallel workers, it ends up being faster even in
> these cases.
>
>

Attachment: v2-0001-Discard-ORDER-BY-and-DISTINCT-in-an-ANY-IN-sub-se.patch
Description: Binary data

Reply via email to