Re: Hash-based MCV matching for large IN-lists

Tatsuya Kawata Fri, 06 Feb 2026 23:43:01 -0800

Hi,

Thank you for this patch.
I've been studying how PostgreSQL handles selectivity estimation, and this
optimization for large IN-lists looks very useful.


I ran some tests for the special cases David mentioned:

- NULL + ALL: correctly returns selectivity ≈ 0 (rows=1)
- isInequality: <> ALL estimates match NOT IN
- Cross-type: int = ANY(bigint[]) works correctly
- Duplicate values: IN (1,1,1,2,2,3) preserves existing behavior

I noticed a few minor points:

1. The comment in MCVInHashEntry struct contains a typo:
   "number of occurrences od current value" -> "of"

2. The ALL + NULL early-return logic appears in two places (lines 2579-2591
and 2644-2656). I initially considered consolidating this by checking for
NULL elements before building the hash table, but realized this would add
an extra loop in the common case where there are no NULLs.
   Perhaps a brief comment explaining why this check is duplicated (to
avoid the overhead of a separate NULL-scanning loop) would help future
readers understand the design choice?

3. Minor style suggestion: adding a brief SQL example in the header comment
(e.g., "WHERE x IN (1,2,3,...)" or "WHERE x = ANY(ARRAY[...])") might help
future readers quickly understand the use case.

Thanks again for working on this optimization. It's been very educational
to follow the discussion and understand how selectivity estimation works in
PostgreSQL.

Regards,
Tatsuya Kawata

Re: Hash-based MCV matching for large IN-lists

Reply via email to