Hello, the attached patch improves the array selectivity estimation for = ANY and <> ALL, hence for the IN/NOT IN operators, to avoid the shortcoming described in <http://archives.postgresql.org/pgsql-performance/2012-03/msg00006.php>.
Cheers, -- Daniele
From d10e60dd3dec340b96b372792e4a0650ec10da92 Mon Sep 17 00:00:00 2001 From: Daniele Varrazzo <daniele.varra...@gmail.com> Date: Sat, 3 Mar 2012 19:27:16 +0000 Subject: [PATCH] Improve array selectivity estimation for = ANY and <> ALL Assume distinct array elements, hence disjoint probabilities instead of independent. Using the wrong probability model can easily lead to serious selectivity overestimation for the IN operator and underestimation for NOT IN. --- src/backend/utils/adt/selfuncs.c | 37 +++++++++++++++++++++++++++++++++---- 1 files changed, 33 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 0a685aa..b0816ca 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -1797,10 +1797,29 @@ scalararraysel(PlannerInfo *root, ObjectIdGetDatum(operator), PointerGetDatum(args), Int32GetDatum(varRelid))); + /* + * For generic operators, assume the selection probabilities as + * independent for each array element. But for the equality the + * probabilities are disjoint, so the total probability is just + * the sum of the probabilities of the single elements. This is + * true only if the array doesn't contain dups, but the check is + * expensive: just assume it, to avoid penalizing well-written + * queries in favour of poorly-written ones. + */ if (useOr) - s1 = s1 + s2 - s1 * s2; + { + if (oprselproc.fn_addr == eqsel) + s1 = s1 + s2; + else + s1 = s1 + s2 - s1 * s2; + } else - s1 = s1 * s2; + { + if (oprselproc.fn_addr == neqsel) + s1 = s1 + s2 - 1.0; + else + s1 = s1 * s2; + } } } else if (rightop && IsA(rightop, ArrayExpr) && @@ -1840,9 +1859,19 @@ scalararraysel(PlannerInfo *root, PointerGetDatum(args), Int32GetDatum(varRelid))); if (useOr) - s1 = s1 + s2 - s1 * s2; + { + if (oprselproc.fn_addr == eqsel) + s1 = s1 + s2; + else + s1 = s1 + s2 - s1 * s2; + } else - s1 = s1 * s2; + { + if (oprselproc.fn_addr == neqsel) + s1 = s1 + s2 - 1.0; + else + s1 = s1 * s2; + } } } else -- 1.7.5.4
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers