Hello,

the attached patch improves the array selectivity estimation for = ANY
and <> ALL, hence for the IN/NOT IN operators, to avoid the
shortcoming described in
<http://archives.postgresql.org/pgsql-performance/2012-03/msg00006.php>.

Cheers,

-- Daniele
From d10e60dd3dec340b96b372792e4a0650ec10da92 Mon Sep 17 00:00:00 2001
From: Daniele Varrazzo <daniele.varra...@gmail.com>
Date: Sat, 3 Mar 2012 19:27:16 +0000
Subject: [PATCH] Improve array selectivity estimation for = ANY and <> ALL

Assume distinct array elements, hence disjoint probabilities instead of
independent. Using the wrong probability model can easily lead to serious
selectivity overestimation for the IN operator and underestimation for NOT IN.
---
 src/backend/utils/adt/selfuncs.c |   37 +++++++++++++++++++++++++++++++++----
 1 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0a685aa..b0816ca 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1797,10 +1797,29 @@ scalararraysel(PlannerInfo *root,
 												  ObjectIdGetDatum(operator),
 												  PointerGetDatum(args),
 												  Int32GetDatum(varRelid)));
+			/*
+			 * For generic operators, assume the selection probabilities as
+			 * independent for each array element. But for the equality the
+			 * probabilities are disjoint, so the total probability is just
+			 * the sum of the probabilities of the single elements. This is
+			 * true only if the array doesn't contain dups, but the check is
+			 * expensive: just assume it, to avoid penalizing well-written
+			 * queries in favour of poorly-written ones.
+			 */
 			if (useOr)
-				s1 = s1 + s2 - s1 * s2;
+			{
+				if (oprselproc.fn_addr == eqsel)
+					s1 = s1 + s2;
+				else
+					s1 = s1 + s2 - s1 * s2;
+			}
 			else
-				s1 = s1 * s2;
+			{
+				if (oprselproc.fn_addr == neqsel)
+					s1 = s1 + s2 - 1.0;
+				else
+					s1 = s1 * s2;
+			}
 		}
 	}
 	else if (rightop && IsA(rightop, ArrayExpr) &&
@@ -1840,9 +1859,19 @@ scalararraysel(PlannerInfo *root,
 												  PointerGetDatum(args),
 												  Int32GetDatum(varRelid)));
 			if (useOr)
-				s1 = s1 + s2 - s1 * s2;
+			{
+				if (oprselproc.fn_addr == eqsel)
+					s1 = s1 + s2;
+				else
+					s1 = s1 + s2 - s1 * s2;
+			}
 			else
-				s1 = s1 * s2;
+			{
+				if (oprselproc.fn_addr == neqsel)
+					s1 = s1 + s2 - 1.0;
+				else
+					s1 = s1 * s2;
+			}
 		}
 	}
 	else
-- 
1.7.5.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to