Hi, Thank you for this patch. I've been studying how PostgreSQL handles selectivity estimation, and this optimization for large IN-lists looks very useful.
I ran some tests for the special cases David mentioned: - NULL + ALL: correctly returns selectivity ≈ 0 (rows=1) - isInequality: <> ALL estimates match NOT IN - Cross-type: int = ANY(bigint[]) works correctly - Duplicate values: IN (1,1,1,2,2,3) preserves existing behavior I noticed a few minor points: 1. The comment in MCVInHashEntry struct contains a typo: "number of occurrences od current value" -> "of" 2. The ALL + NULL early-return logic appears in two places (lines 2579-2591 and 2644-2656). I initially considered consolidating this by checking for NULL elements before building the hash table, but realized this would add an extra loop in the common case where there are no NULLs. Perhaps a brief comment explaining why this check is duplicated (to avoid the overhead of a separate NULL-scanning loop) would help future readers understand the design choice? 3. Minor style suggestion: adding a brief SQL example in the header comment (e.g., "WHERE x IN (1,2,3,...)" or "WHERE x = ANY(ARRAY[...])") might help future readers quickly understand the use case. Thanks again for working on this optimization. It's been very educational to follow the discussion and understand how selectivity estimation works in PostgreSQL. Regards, Tatsuya Kawata
