On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
Oh ... while we're piling on here, it just sunk into me that
mcv_get_match_bitmap is deciding what the semantics of an operator
are by seeing what it's using for a selectivity estimator.
That is just absolutely, completely wrong.  For starters, it
means that the whole mechanism fails for any operator that wants
to use a specialized estimator --- hardly an unreasonable thing
to do.  For another, it's going to be pretty unreliable for
extensions, because I do not think they're all careful about using
the right estimator --- a lot of 'em probably still haven't adapted
to the introduction of separate <= / >= estimators, for instance.

The right way to determine operator semantics is to look to see
whether they are in a btree opclass.  That's what the rest of the
planner does, and there is no good reason for the mcv code to
do it some other way.


Hmmm, but that will mean we're unable to estimate operators that are not
part of a btree opclass. Which is a bit annoying, because that would also
affect equalities (and thus functional dependencies), in which case the
type may easily have just hash opclass or something.

But maybe I just don't understand how the btree opclass detection works
and it'd be fine for sensibly defined operators. Can you point me to code
doing this elsewhere in the planner?

We may need to do something about code in pg10, because functional
dependencies this too (although just with F_EQSEL).  But maybe we should
leave pg10 alone, we got exactly 0 reports about it until now anyway.


Here are WIP patches addressing two of the issues:

1) determining operator semantics by matching them to btree opclasses

2) deconstructing OpExpr into Var/Const nodes

I'd appreciate some feedback particularly on (1).

For the two other issues mentioned in this thread:

a) I don't think unary-argument OpExpr are an issue, because this is
checked when determining which clauses are compatible (and the code only
allows the case with 2 arguments).

b) Const with constisnull=true - I'm not yet sure what to do about this.
The easiest fix would be to deem those clauses incompatible, but that
seems a bit too harsh. The right thing is probably passing the NULL to
the operator proc (but that means we can't use FunctionCall).

Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
calling the operator is the right thing. We're using type->typcollation
when building the stats, so maybe we should do the same thing here.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 043b7525590278fabacd3c10ef869b1ef2b7358f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Fri, 12 Jul 2019 22:26:20 +0200
Subject: [PATCH 1/2] Determine operator semantics using btree opclasses

---
 src/backend/statistics/dependencies.c   | 15 +++---
 src/backend/statistics/extended_stats.c | 69 +++++++++++++++++++------
 src/backend/statistics/mcv.c            | 60 ++++++++++++---------
 src/include/statistics/statistics.h     |  1 +
 4 files changed, 99 insertions(+), 46 deletions(-)

diff --git a/src/backend/statistics/dependencies.c 
b/src/backend/statistics/dependencies.c
index 66c38ce2bc..2723ca33df 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_statistic_ext.h"
@@ -788,15 +789,13 @@ dependency_is_compatible_clause(Node *clause, Index 
relid, AttrNumber *attnum)
                 * If it's not an "=" operator, just ignore the clause, as it's 
not
                 * compatible with functional dependencies.
                 *
-                * This uses the function for estimating selectivity, not the 
operator
-                * directly (a bit awkward, but well ...).
-                *
-                * XXX this is pretty dubious; probably it'd be better to check 
btree
-                * or hash opclass membership, so as not to be fooled by custom
-                * selectivity functions, and to be more consistent with 
decisions
-                * elsewhere in the planner.
+                * To decide this, we need to decide the semantics of the 
operator.
+                * We do that by matching it to btree opclasses, and checking 
the
+                * strategy numbers. When there's no btree opclass, or when 
there
+                * are multiple strategy numbers, we consider the semantics to 
be
+                * unclear (i.e. not an equality operator).
                 */
-               if (get_oprrest(expr->opno) != F_EQSEL)
+               if (determine_operator_strategy(expr->opno) != 
BTEqualStrategyNumber)
                        return false;
 
                /* OK to proceed with checking "var" */
diff --git a/src/backend/statistics/extended_stats.c 
b/src/backend/statistics/extended_stats.c
index c56ed48270..2ed28c054f 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -20,6 +20,7 @@
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/tuptoaster.h"
+#include "access/stratnum.h"
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
@@ -819,22 +820,14 @@ statext_is_compatible_clause_internal(PlannerInfo *root, 
Node *clause,
                 *
                 * This uses the function for estimating selectivity, not the 
operator
                 * directly (a bit awkward, but well ...).
+                *
+                * XXX We can't check this against strategies defined in 
stratnum.h,
+                * because get_op_btree_interpretation() introduces a new 
strategy
+                * number for operator '<>' and we'd miss that. So we simply 
check
+                * if the result is (-1) which means strategy is unknown.
                 */
-               switch (get_oprrest(expr->opno))
-               {
-                       case F_EQSEL:
-                       case F_NEQSEL:
-                       case F_SCALARLTSEL:
-                       case F_SCALARLESEL:
-                       case F_SCALARGTSEL:
-                       case F_SCALARGESEL:
-                               /* supported, will continue with inspection of 
the Var */
-                               break;
-
-                       default:
-                               /* other estimators are considered 
unknown/unsupported */
-                               return false;
-               }
+               if (determine_operator_strategy(expr->opno) == -1)
+                       return false;
 
                /*
                 * If there are any securityQuals on the RTE from security 
barrier
@@ -1196,3 +1189,49 @@ statext_clauselist_selectivity(PlannerInfo *root, List 
*clauses, int varRelid,
 
        return sel;
 }
+
+/*
+ * determine_operator_strategy
+ *             Determine operator semantics by matching it to btree opclasses.
+ *
+ * We look at all the btree opclasses referencing the supplied operator, and
+ * check if the strategy number is the same in all of them. If yes, we consider
+ * that to be the semantics of the operator (equality, less-than, ...) and
+ * return that strategy number.
+ *
+ * If there are no matching btree opclasses, or when the operator has multiple
+ * strategy numbers in various opclasses, we consider the operator semantics to
+ * be unclear/undefined and return -1.
+ *
+ * Note: This only looks at btree opclasses, so for data types with only a hash
+ * opclass, this always returns (-1). That ultimately means we won't be able
+ * to use extended statistics for them.
+ *
+ * Note: The result strategy number may not be defined in stratnum.h, so you
+ * need to be careful when checking the result. This is because this calls
+ * get_op_btree_interpretation which may introduce new stragies that are not
+ * actually part of btree definition (e.g. '<>')
+ */
+int
+determine_operator_strategy(Oid opno)
+{
+       ListCell   *lc;
+       int                     strat;
+       List       *opinfos;
+       Bitmapset  *strats = NULL;
+
+       opinfos = get_op_btree_interpretation(opno);
+
+       foreach(lc, opinfos)
+       {
+               OpBtreeInterpretation *opinfo = lfirst(lc);
+
+               strats = bms_add_member(strats, opinfo->strategy);
+       }
+
+       /* multiple (or multiple) strategies means unclear semantics */
+       if (!bms_get_singleton_member(strats, &strat))
+               return -1;
+
+       return strat;
+}
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 913a72ff67..1e919c82f9 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1565,9 +1565,6 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                        bool            ok;
                        FmgrInfo        opproc;
 
-                       /* get procedure computing operator selectivity */
-                       RegProcedure oprrest = get_oprrest(expr->opno);
-
                        fmgr_info(get_opcode(expr->opno), &opproc);
 
                        ok = (NumRelids(clause) == 1) &&
@@ -1583,6 +1580,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                                Const      *cst;
                                bool            isgt;
                                int                     idx;
+                               int                     strategy;
 
                                /* extract the var and const from the 
expression */
                                var = (varonleft) ? linitial(expr->args) : 
lsecond(expr->args);
@@ -1600,6 +1598,16 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                                typecache = lookup_type_cache(var->vartype, 
TYPECACHE_GT_OPR);
                                fmgr_info(get_opcode(typecache->gt_opr), 
&gtproc);
 
+                               /* determine operator semantics from btree 
opclasses */
+                               strategy = 
determine_operator_strategy(expr->opno);
+
+                               /*
+                                * At this point all the expressions must have 
a meaningful
+                                * btree opclass interpretation (thanks to 
passing through
+                                * statext_is_compatible_clause, which enforces 
that).
+                                */
+                               Assert(strategy > 0);
+
                                /*
                                 * Walk through the MCV items and evaluate the 
current clause.
                                 * We can skip items that were already ruled 
out, and
@@ -1625,27 +1633,12 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                                        else if (is_or && (matches[i] == true))
                                                continue;
 
-                                       switch (oprrest)
+                                       switch (strategy)
                                        {
-                                               case F_EQSEL:
-                                               case F_NEQSEL:
-
-                                                       /*
-                                                        * We don't care about 
isgt in equality, because
-                                                        * it does not matter 
whether it's (var op const)
-                                                        * or (const op var).
-                                                        */
-                                                       mismatch = 
!DatumGetBool(FunctionCall2Coll(&opproc,
-                                                                               
                                                           
DEFAULT_COLLATION_OID,
-                                                                               
                                                           cst->constvalue,
-                                                                               
                                                           item->values[idx]));
-
-                                                       break;
-
-                                               case F_SCALARLTSEL: /* column < 
constant */
-                                               case F_SCALARLESEL: /* column 
<= constant */
-                                               case F_SCALARGTSEL: /* column > 
constant */
-                                               case F_SCALARGESEL: /* column 
>= constant */
+                                               case BTLessStrategyNumber:      
   /* column < constant */
+                                               case BTLessEqualStrategyNumber: 
   /* column <= constant */
+                                               case 
BTGreaterEqualStrategyNumber: /* column > constant */
+                                               case BTGreaterStrategyNumber:   
   /* column >= constant */
 
                                                        /*
                                                         * First check whether 
the constant is below the
@@ -1664,6 +1657,27 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                                                                                
                                                                   
cst->constvalue));
 
                                                        break;
+
+                                               default:
+
+                                                       /*
+                                                        * This does handle 
equality and inequality, but we handle
+                                                        * that as a default 
case because inequality operator '<>'
+                                                        * does not actually 
have a btree opclass strategy, it's
+                                                        * added by 
get_op_btree_interpretation.
+                                                        */
+
+                                                       /*
+                                                        * We don't care about 
isgt in equality, because
+                                                        * it does not matter 
whether it's (var op const)
+                                                        * or (const op var).
+                                                        */
+                                                       mismatch = 
!DatumGetBool(FunctionCall2Coll(&opproc,
+                                                                               
                                                           
DEFAULT_COLLATION_OID,
+                                                                               
                                                           cst->constvalue,
+                                                                               
                                                           item->values[idx]));
+
+                                                       break;
                                        }
 
                                        /*
diff --git a/src/include/statistics/statistics.h 
b/src/include/statistics/statistics.h
index cb7bc630e9..82e2d5c619 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -118,5 +118,6 @@ extern Selectivity 
statext_clauselist_selectivity(PlannerInfo *root,
 extern bool has_stats_of_kind(List *stats, char requiredkind);
 extern StatisticExtInfo *choose_best_statistics(List *stats,
                                                                                
                Bitmapset *attnums, char requiredkind);
+extern int determine_operator_strategy(Oid opno);
 
 #endif                                                 /* STATISTICS_H */
-- 
2.20.1

>From 354191c27867d9dc5672f1abad14941e29e3d595 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Sat, 13 Jul 2019 00:12:16 +0200
Subject: [PATCH 2/2] Fix processing of OpExpr in extended statistics

---
 src/backend/statistics/extended_stats.c | 64 ++++++++++++++++++++-----
 src/backend/statistics/mcv.c            | 23 +++------
 src/include/statistics/statistics.h     |  1 +
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c 
b/src/backend/statistics/extended_stats.c
index 2ed28c054f..1c7ca07661 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -797,21 +797,13 @@ statext_is_compatible_clause_internal(PlannerInfo *root, 
Node *clause,
                RangeTblEntry *rte = root->simple_rte_array[relid];
                OpExpr     *expr = (OpExpr *) clause;
                Var                *var;
-               bool            varonleft = true;
-               bool            ok;
 
                /* Only expressions with two arguments are considered 
compatible. */
                if (list_length(expr->args) != 2)
                        return false;
 
-               /* see if it actually has the right shape (one Var, one Const) 
*/
-               ok = (NumRelids((Node *) expr) == 1) &&
-                       (is_pseudo_constant_clause(lsecond(expr->args)) ||
-                        (varonleft = false,
-                         is_pseudo_constant_clause(linitial(expr->args))));
-
-               /* unsupported structure (two variables or so) */
-               if (!ok)
+               /* Check if the expression the right shape (one Var, one Const) 
*/
+               if (!deconstruct_opexpr(expr, &var, NULL, NULL))
                        return false;
 
                /*
@@ -843,8 +835,6 @@ statext_is_compatible_clause_internal(PlannerInfo *root, 
Node *clause,
                        !get_func_leakproof(get_opcode(expr->opno)))
                        return false;
 
-               var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
-
                return statext_is_compatible_clause_internal(root, (Node *) var,
                                                                                
                         relid, attnums);
        }
@@ -1235,3 +1225,53 @@ determine_operator_strategy(Oid opno)
 
        return strat;
 }
+
+/*
+ * deconstruct_opexpr
+ *             Split expression into Var and Const parts.
+ *
+ * Attempts to match the arguments to either (Var op Const) or (Const op Var),
+ * possibly with a RelabelType on top of the Var node. When successful, returns
+ * true, otherwise returns false.
+ *
+ * Optionally returns pointers to the extracted elements, when passed non-null
+ * pointers (varp, cstp and isgtp).
+ */
+bool
+deconstruct_opexpr(OpExpr *expr, Var **varp, Const **cstp, bool *isgtp)
+{
+       Var        *var;
+       Const  *cst;
+       bool    isgt;
+
+       /* enforced by statext_is_compatible_clause_internal */
+       Assert(list_length(expr->args) == 2);
+
+       if ((IsA(linitial(expr->args), Var) || IsA(linitial(expr->args), 
RelabelType)) &&
+               IsA(lsecond(expr->args), Const))
+       {
+               var = linitial(expr->args);
+               cst = lsecond(expr->args);
+               isgt = false;
+       }
+       else if (IsA(linitial(expr->args), Const) &&
+                        (IsA(lsecond(expr->args), Var) || 
IsA(lsecond(expr->args), RelabelType)))
+       {
+               var = lsecond(expr->args);
+               cst = linitial(expr->args);
+               isgt = true;
+       }
+       else
+               return false;
+
+       if (varp)
+               *varp = var;
+
+       if (cstp)
+               *cstp = cst;
+
+       if (isgtp)
+               *isgtp = isgt;
+
+       return true;
+}
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 1e919c82f9..0d7f6969ee 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1561,32 +1561,23 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                if (is_opclause(clause))
                {
                        OpExpr     *expr = (OpExpr *) clause;
-                       bool            varonleft = true;
-                       bool            ok;
                        FmgrInfo        opproc;
 
-                       fmgr_info(get_opcode(expr->opno), &opproc);
+                       /* valid only after deconstruct_opexpr returns true  */
+                       Var                *var;
+                       Const      *cst;
+                       bool            isgt;
 
-                       ok = (NumRelids(clause) == 1) &&
-                               (is_pseudo_constant_clause(lsecond(expr->args)) 
||
-                                (varonleft = false,
-                                 
is_pseudo_constant_clause(linitial(expr->args))));
+                       fmgr_info(get_opcode(expr->opno), &opproc);
 
-                       if (ok)
+                       /* extract the var and const from the expression */
+                       if (deconstruct_opexpr(expr, &var, &cst, &isgt))
                        {
                                TypeCacheEntry *typecache;
                                FmgrInfo        gtproc;
-                               Var                *var;
-                               Const      *cst;
-                               bool            isgt;
                                int                     idx;
                                int                     strategy;
 
-                               /* extract the var and const from the 
expression */
-                               var = (varonleft) ? linitial(expr->args) : 
lsecond(expr->args);
-                               cst = (varonleft) ? lsecond(expr->args) : 
linitial(expr->args);
-                               isgt = (!varonleft);
-
                                /* strip binary-compatible relabeling */
                                if (IsA(var, RelabelType))
                                        var = (Var *) ((RelabelType *) 
var)->arg;
diff --git a/src/include/statistics/statistics.h 
b/src/include/statistics/statistics.h
index 82e2d5c619..f5120d0663 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -119,5 +119,6 @@ extern bool has_stats_of_kind(List *stats, char 
requiredkind);
 extern StatisticExtInfo *choose_best_statistics(List *stats,
                                                                                
                Bitmapset *attnums, char requiredkind);
 extern int determine_operator_strategy(Oid opno);
+extern bool deconstruct_opexpr(OpExpr *expr, Var **varp, Const **cstp, bool 
*isgtp);
 
 #endif                                                 /* STATISTICS_H */
-- 
2.20.1

Reply via email to