> On Jan 14, 2026, at 18:19, Ilia Evdokimov <[email protected]> > wrote: > I’ve attached v2 of the patch. It currently uses two fairly large helper > functions for the Const and ArrayExpr cases; this is intentional to keep the > logic explicit and reviewable, even though these will likely need refactoring > or consolidation later.
Thanks for working on this. I had previously reviewed the v2 patch and wrote up some comments, but didn’t get a chance to send them before v3 was posted. I haven’t yet had time to review v3 in detail, so I’m not sure whether the issues below have already been addressed there. I’m posting my earlier review notes first and will follow up with comments on v3 once I’ve had a chance to look at it. * Treat NULL array elements as zero selectivity for ALL: In `scalararray_mcv_hash_match_const()` (and similarly `scalararray_mcv_hash_match_expr()`), NULL array elements are currently handled by simply continuing the loop (e.g. `if (elem_nulls[i]) continue;`), effectively ignoring them. This behavior is only correct for ANY/OR semantics. For ALL/AND (`useOr = false`), a single NULL array element causes the `ScalarArrayOpExpr` to never return TRUE for strict operators (as assumed by the surrounding code and comments). In that case, the correct selectivity estimate should be 0.0, but the current code path can return a non-zero selectivity. * Fix cross-type equality argument order in `mcvs_in_equal`: `mcvs_in_equal()` always invokes the equality function as `(key0, key1)`. However, `simplehash` provides `key0` from the hash table and `key1` as the probe key. In the branch where the hash table is built over IN-list values and probed with MCVs (the `sslot.nvalues > num_elems` path), this reverses the operator’s argument order for cross-type equality operators. This risks incorrect match decisions and may misinterpret Datums compared to the operator’s declared signature. * Include non-MCV IN-list constants in non-disjoint selectivity: In the `sslot.nvalues > num_elems` path of `scalararray_mcv_hash_match_const()` and `scalararray_mcv_hash_match_expr()`, non-MCV constant elements currently only contribute via `disjoint_sel`. For cases where disjoint-probability estimation is not used (e.g. ALL, `<> ANY`, or when `disjoint_sel` is out of range), the code leaves the selectivity based solely on MCV matches. This effectively treats non-MCV constants as having probability 1.0, leading to overestimation of selectivity. * Avoid double-negating inequality estimates for non-Const elements: In the `scalararray_mcv_hash_match_expr()` `sslot.nvalues > num_elems` branch, non-Const elements are handled via `var_eq_non_const(..., negate = isInequality)` and then later adjusted again with `if (isInequality) s1 = 1.0 - s1 - nullfrac;` This results in a double negation for inequality cases, effectively turning the estimate back into an equality selectivity. -- Best regards, Chengpeng Yan
