On Thu, Jul 18, 2019 at 11:16:08AM -0400, Tom Lane wrote:
Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
I've pushed the fixes listed in the previous message, with the exception
of the collation part, because I had some doubts about that.

Sorry for being slow on this.

Now, for the collation part - after some more thought and looking at code
I think the fix I shared before is OK. It uses per-column collations
consistently both when building the stats and estimating clauses, which
makes it consistent. I was not quite sure if using Var->varcollid is the
same thing as pg_attribute.attcollation, but it seems it is - at least for
Vars pointing to regular columns (which for extended stats should always
be the case).

I think you are right, but it could use some comments in the code.
The important point here is that if we coerce a Var's collation to
something else, that will be represented as a separate CollateExpr
(which will be a RelabelType by the time it gets here, I believe).
We don't just replace varcollid, the way eval_const_expressions will
do to a Const.


OK, thanks. I've added a comment about that into mcv_get_match_bitmap (not
all the details about RelabelType, because that gets stripped while
examining the opexpr, but generally about the collations).


While I'm looking at the code --- I don't find this at all convincing:

                           /*
                            * We don't care about isgt in equality, because
                            * it does not matter whether it's (var op const)
                            * or (const op var).
                            */
                           match = DatumGetBool(FunctionCall2Coll(&opproc,
                                                                  
DEFAULT_COLLATION_OID,
                                                                  
cst->constvalue,
                                                                  
item->values[idx]));

It *will* matter if the operator is cross-type.  I think there is no
good reason to have different code paths for the equality and inequality
cases --- just look at isgt and swap or don't swap the arguments.

BTW, "isgt" seems like a completely misleading name for that variable.
AFAICS, what that is actually telling is whether the var is on the left
or right side of the OpExpr.  Everywhere else in the planner with a
need for that uses "bool varonleft", and I think you should do likewise
here (though note that that isgt, as coded, is more like "varonright").


Yes, you're right in both cases. I've fixed the first issue with equality
by simply using the same code as for all operators (which means we don't
need to check the oprrest at all in this code, making it simpler).

And I've reworked the examine_opclause_expression() to use varonleft
naming - I agree it's a much better name. This was one of the remnants of
the code I initially copied from somewhere in selfuncs.c and massaged
it until it did what I needed.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>From 14dc49ad7eefd6e67b751e74ce7253cdc85ca5ad Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Fri, 19 Jul 2019 16:28:28 +0200
Subject: [PATCH 1/2] Rework examine_opclause_expression to use varonleft

The examine_opclause_expression function needs to return information on
which side of the operator we found the Var, but the variable was called
"isgt" which is rather misleading (it assumes the operator is either
less-than or greater-than, but it may be equality or something else).
Other places in the planner use a variable called "varonleft" for this
purpose, so just adopt the same convention here.

The code also assumed we don't care about this flag for equality, as
(Var = Const) and (Const = Var) should be the same thing. But that does
not work for cross-type operators, in which case we need to pass the
parameters to the procedure in the right order. So just use the same
code for all types of expressions.

This means we don't need to care about the selectivity estimation
function anymore, at least not in this code. We should only get the
supported cases here (thanks to statext_is_compatible_clause).

Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
---
 src/backend/statistics/extended_stats.c       | 16 ++---
 src/backend/statistics/mcv.c                  | 62 +++++--------------
 .../statistics/extended_stats_internal.h      |  2 +-
 3 files changed, 26 insertions(+), 54 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c 
b/src/backend/statistics/extended_stats.c
index d2346cbe02..23c74f7d90 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1196,15 +1196,15 @@ statext_clauselist_selectivity(PlannerInfo *root, List 
*clauses, int varRelid,
  * returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Var/Const nodes, when passed
- * non-null pointers (varp, cstp and isgtp). The isgt flag specifies whether
- * the Var node is on the left (false) or right (true) side of the operator.
+ * non-null pointers (varp, cstp and varonleftp). The varonleftp flag specifies
+ * on which side of the operator we found the Var node.
  */
 bool
-examine_opclause_expression(OpExpr *expr, Var **varp, Const **cstp, bool 
*isgtp)
+examine_opclause_expression(OpExpr *expr, Var **varp, Const **cstp, bool 
*varonleftp)
 {
        Var        *var;
        Const  *cst;
-       bool    isgt;
+       bool    varonleft;
        Node   *leftop,
                   *rightop;
 
@@ -1225,13 +1225,13 @@ examine_opclause_expression(OpExpr *expr, Var **varp, 
Const **cstp, bool *isgtp)
        {
                var = (Var *) leftop;
                cst = (Const *) rightop;
-               isgt = false;
+               varonleft = true;
        }
        else if (IsA(leftop, Const) && IsA(rightop, Var))
        {
                var = (Var *) rightop;
                cst = (Const *) leftop;
-               isgt = true;
+               varonleft = false;
        }
        else
                return false;
@@ -1243,8 +1243,8 @@ examine_opclause_expression(OpExpr *expr, Var **varp, 
Const **cstp, bool *isgtp)
        if (cstp)
                *cstp = cst;
 
-       if (isgtp)
-               *isgtp = isgt;
+       if (varonleftp)
+               *varonleftp = varonleft;
 
        return true;
 }
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index e5a4e86c5d..2b685ec67a 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1581,18 +1581,15 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                        OpExpr     *expr = (OpExpr *) clause;
                        FmgrInfo        opproc;
 
-                       /* get procedure computing operator selectivity */
-                       RegProcedure oprrest = get_oprrest(expr->opno);
-
                        /* valid only after examine_opclause_expression returns 
true */
                        Var                *var;
                        Const      *cst;
-                       bool            isgt;
+                       bool            varonleft;
 
                        fmgr_info(get_opcode(expr->opno), &opproc);
 
                        /* extract the var and const from the expression */
-                       if (examine_opclause_expression(expr, &var, &cst, 
&isgt))
+                       if (examine_opclause_expression(expr, &var, &cst, 
&varonleft))
                        {
                                int                     idx;
 
@@ -1629,46 +1626,21 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                                        if (RESULT_IS_FINAL(matches[i], is_or))
                                                continue;
 
-                                       switch (oprrest)
-                                       {
-                                               case F_EQSEL:
-                                               case F_NEQSEL:
-
-                                                       /*
-                                                        * We don't care about 
isgt in equality, because
-                                                        * it does not matter 
whether it's (var op const)
-                                                        * or (const op var).
-                                                        */
-                                                       match = 
DatumGetBool(FunctionCall2Coll(&opproc,
-                                                                               
                                                   DEFAULT_COLLATION_OID,
-                                                                               
                                                   cst->constvalue,
-                                                                               
                                                   item->values[idx]));
-
-                                                       break;
-
-                                               case F_SCALARLTSEL: /* column < 
constant */
-                                               case F_SCALARLESEL: /* column 
<= constant */
-                                               case F_SCALARGTSEL: /* column > 
constant */
-                                               case F_SCALARGESEL: /* column 
>= constant */
-
-                                                       /*
-                                                        * First check whether 
the constant is below the
-                                                        * lower boundary (in 
that case we can skip the
-                                                        * bucket, because 
there's no overlap).
-                                                        */
-                                                       if (isgt)
-                                                               match = 
DatumGetBool(FunctionCall2Coll(&opproc,
-                                                                               
                                                           
DEFAULT_COLLATION_OID,
-                                                                               
                                                           cst->constvalue,
-                                                                               
                                                           item->values[idx]));
-                                                       else
-                                                               match = 
DatumGetBool(FunctionCall2Coll(&opproc,
-                                                                               
                                                           
DEFAULT_COLLATION_OID,
-                                                                               
                                                           item->values[idx],
-                                                                               
                                                           cst->constvalue));
-
-                                                       break;
-                                       }
+                                       /*
+                                        * First check whether the constant is 
below the lower
+                                        * boundary (in that case we can skip 
the bucket, because
+                                        * there's no overlap).
+                                        */
+                                       if (varonleft)
+                                               match = 
DatumGetBool(FunctionCall2Coll(&opproc,
+                                                                               
                                           DEFAULT_COLLATION_OID,
+                                                                               
                                           item->values[idx],
+                                                                               
                                           cst->constvalue));
+                                       else
+                                               match = 
DatumGetBool(FunctionCall2Coll(&opproc,
+                                                                               
                                           DEFAULT_COLLATION_OID,
+                                                                               
                                           cst->constvalue,
+                                                                               
                                           item->values[idx]));
 
                                        /* update the match bitmap with the 
result */
                                        matches[i] = RESULT_MERGE(matches[i], 
is_or, match);
diff --git a/src/include/statistics/extended_stats_internal.h 
b/src/include/statistics/extended_stats_internal.h
index c7f01d4edc..8433c34f6d 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -98,7 +98,7 @@ extern SortItem *build_sorted_items(int numrows, int *nitems, 
HeapTuple *rows,
                                                                        int 
numattrs, AttrNumber *attnums);
 
 extern bool examine_opclause_expression(OpExpr *expr, Var **varp,
-                                                                               
Const **cstp, bool *isgtp);
+                                                                               
Const **cstp, bool *varonleftp);
 
 extern Selectivity mcv_clauselist_selectivity(PlannerInfo *root,
                                                                                
          StatisticExtInfo *stat,
-- 
2.21.0

>From 409b0243f14e91b922a08ab2e1e5fca71284d3e3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Thu, 18 Jul 2019 12:28:16 +0200
Subject: [PATCH 2/2] Use column collation for extended statistics

The current extended statistics code was a bit confused which collation
to use.  When building the statistics, the collations defined as default
for the data types were used (since commit 5e0928005).  The MCV code was
however using the column collations for MCV serialization, and then
DEFAULT_COLLATION_OID when computing estimates. So overall the code was
using all three possible options, inconsistently.

This uses the column colation everywhere - this makes it consistent with
what 5e0928005 did for regular stats.  We however do not track the
collations in a catalog, because we can derive them from column-level
information.  This may need to change in the future, e.g. after allowing
statistics on expressions.

Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
---
 src/backend/commands/statscmds.c      |  4 ++++
 src/backend/statistics/dependencies.c |  2 +-
 src/backend/statistics/mcv.c          | 15 +++++++++++----
 src/backend/statistics/mvdistinct.c   |  2 +-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index cf406f6f96..34d11c2a98 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -485,6 +485,10 @@ RemoveStatisticsById(Oid statsOid)
  *
  * For MCV lists that's not the case, as those statistics store the datums
  * internally. In this case we simply reset the statistics value to NULL.
+ *
+ * Note that "type change" includes collation change, which means we can rely
+ * on the MCV list being consistent with the collation info in pg_attribute
+ * during estimation.
  */
 void
 UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
diff --git a/src/backend/statistics/dependencies.c 
b/src/backend/statistics/dependencies.c
index 66c38ce2bc..585cad2ad9 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -273,7 +273,7 @@ dependency_degree(int numrows, HeapTuple *rows, int k, 
AttrNumber *dependency,
                                 colstat->attrtypid);
 
                /* prepare the sort function for this dimension */
-               multi_sort_add_dimension(mss, i, type->lt_opr, 
type->typcollation);
+               multi_sort_add_dimension(mss, i, type->lt_opr, 
colstat->attrcollid);
        }
 
        /*
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 2b685ec67a..cec06f8c44 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -366,7 +366,7 @@ build_mss(VacAttrStats **stats, int numattrs)
                        elog(ERROR, "cache lookup failed for ordering operator 
for type %u",
                                 colstat->attrtypid);
 
-               multi_sort_add_dimension(mss, i, type->lt_opr, 
type->typcollation);
+               multi_sort_add_dimension(mss, i, type->lt_opr, 
colstat->attrcollid);
        }
 
        return mss;
@@ -686,7 +686,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
 
                /* sort and deduplicate the data */
                ssup[dim].ssup_cxt = CurrentMemoryContext;
-               ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;
+               ssup[dim].ssup_collation = stats[dim]->attrcollid;
                ssup[dim].ssup_nulls_first = false;
 
                PrepareSortSupportFromOrderingOp(typentry->lt_opr, &ssup[dim]);
@@ -1630,15 +1630,22 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
                                         * First check whether the constant is 
below the lower
                                         * boundary (in that case we can skip 
the bucket, because
                                         * there's no overlap).
+                                        *
+                                        * We don't store collations used to 
build the statistics,
+                                        * but we can use the collation for the 
attribute itself,
+                                        * as stored in varcollid. We do reset 
the statistics after
+                                        * a type change (including collation 
change), so this is
+                                        * OK. We may need to relax this after 
allowing extended
+                                        * statistics on expressions.
                                         */
                                        if (varonleft)
                                                match = 
DatumGetBool(FunctionCall2Coll(&opproc,
-                                                                               
                                           DEFAULT_COLLATION_OID,
+                                                                               
                                           var->varcollid,
                                                                                
                                           item->values[idx],
                                                                                
                                           cst->constvalue));
                                        else
                                                match = 
DatumGetBool(FunctionCall2Coll(&opproc,
-                                                                               
                                           DEFAULT_COLLATION_OID,
+                                                                               
                                           var->varcollid,
                                                                                
                                           cst->constvalue,
                                                                                
                                           item->values[idx]));
 
diff --git a/src/backend/statistics/mvdistinct.c 
b/src/backend/statistics/mvdistinct.c
index 536605b83d..f3383c05d9 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -477,7 +477,7 @@ ndistinct_for_combination(double totalrows, int numrows, 
HeapTuple *rows,
                                 colstat->attrtypid);
 
                /* prepare the sort function for this dimension */
-               multi_sort_add_dimension(mss, i, type->lt_opr, 
type->typcollation);
+               multi_sort_add_dimension(mss, i, type->lt_opr, 
colstat->attrcollid);
 
                /* accumulate all the data for this dimension into the arrays */
                for (j = 0; j < numrows; j++)
-- 
2.21.0

Reply via email to