On 8/16/21 3:32 AM, Justin Pryzby wrote:
On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote:
Looking at the current behaviour, there are a couple of things that
seem a little odd, even though they are understandable. For example,
the fact that

   CREATE STATISTICS s (expressions) ON (expr), col FROM tbl;

fails, but

   CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl;

succeeds and creates both "expressions" and "mcv" statistics. Also, the syntax

   CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl;

tends to suggest that it's going to create statistics on the pair of
expressions, describing their correlation, when actually it builds 2
independent statistics. Also, this error text isn't entirely accurate:

   CREATE STATISTICS s ON col FROM tbl;
   ERROR:  extended statistics require at least 2 columns

because extended statistics don't always require 2 columns, they can
also just have an expression, or multiple expressions and 0 or 1
columns.

I think a lot of this stems from treating "expressions" in the same
way as the other (multi-column) stats kinds, and it might actually be
neater to have separate documented syntaxes for single- and
multi-column statistics:

   CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
     ON (expression)
     FROM table_name

   CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
     [ ( statistics_kind [, ... ] ) ]
     ON { column_name | (expression) } , { column_name | (expression) } [, ...]
     FROM table_name

The first syntax would create single-column stats, and wouldn't accept
a statistics_kind argument, because there is only one kind of
single-column statistic. Maybe that might change in the future, but if
so, it's likely that the kinds of single-column stats will be
different from the kinds of multi-column stats.

In the second syntax, the only accepted kinds would be the current
multi-column stats kinds (ndistinct, dependencies, and mcv), and it
would always build stats describing the correlations between the
columns listed. It would continue to build standard/expression stats
on any expressions in the list, but that's more of an implementation
detail.

It would no longer be possible to do "CREATE STATISTICS s
(expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to
issue 2 separate "CREATE STATISTICS" commands, but that seems more
logical, because they're independent stats.

The parsing code might not change much, but some of the errors would
be different. For example, the errors "building only extended
expression statistics on simple columns not allowed" and "extended
expression statistics require at least one expression" would go away,
and the error "extended statistics require at least 2 columns" might
become more specific, depending on the stats kind.

This still seems odd:

postgres=# CREATE STATISTICS asf ON i FROM t;
ERROR:  extended statistics require at least 2 columns
postgres=# CREATE STATISTICS asf ON (i) FROM t;
CREATE STATISTICS

It seems wrong that the command works with added parens, but builds expression
stats on a simple column (which is redundant with what analyze does without
extended stats).


Well, yeah. But I think this is a behavior that was discussed somewhere in this thread, and the agreement was that it's not worth the complexity, as this comment explains

  * XXX We do only the bare minimum to separate simple attribute and
  * complex expressions - for example "(a)" will be treated as a complex
  * expression. No matter how elaborate the check is, there'll always be
  * a way around it, if the user is determined (consider e.g. "(a+0)"),
  * so it's not worth protecting against it.

Of course, maybe that wasn't the right decision - it's a bit weird that

  CREATE INDEX on t ((a), (b))

actually "extracts" the column references and stores that in indkeys, instead of treating that as expressions.

Patch 0001 fixes the "double parens" issue discussed elsewhere in this thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple column reference.

But I'm not sure 0002 is something we can do without catversion bump. What if someone created such "bogus" statistics? It's mostly harmless, because the statistics is useless anyway (AFAICS we'll just use the regular one we have for the column), but if they do pg_dump, that'll fail because of this new restriction.

OTOH we're still "only" in beta, and IIRC the rule is not to bump catversion after rc1.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 4428ee5b46fe1ce45331c355e1646520ca769991 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Mon, 16 Aug 2021 16:40:43 +0200
Subject: [PATCH 1/2] fix: don't print double parens

---
 src/backend/utils/adt/ruleutils.c             |   2 +-
 .../regress/expected/create_table_like.out    |   6 +-
 src/test/regress/expected/stats_ext.out       | 110 +++++++++---------
 3 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 4df8cc5abf..a762d664fb 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1712,7 +1712,7 @@ pg_get_statisticsobj_worker(Oid statextid, bool columns_only, bool missing_ok)
 	{
 		Node	   *expr = (Node *) lfirst(lc);
 		char	   *str;
-		int			prettyFlags = PRETTYFLAG_INDENT;
+		int			prettyFlags = PRETTYFLAG_PAREN;
 
 		str = deparse_expression_pretty(expr, context, false, false,
 										prettyFlags, 0);
diff --git a/src/test/regress/expected/create_table_like.out b/src/test/regress/expected/create_table_like.out
index 7ad5fafe93..d4bd4d341d 100644
--- a/src/test/regress/expected/create_table_like.out
+++ b/src/test/regress/expected/create_table_like.out
@@ -417,7 +417,7 @@ Check constraints:
     "ctlt1_a_check" CHECK (length(a) > 2)
 Statistics objects:
     "public"."ctlt_all_a_b_stat" ON a, b FROM ctlt_all
-    "public"."ctlt_all_expr_stat" ON ((a || b)) FROM ctlt_all
+    "public"."ctlt_all_expr_stat" ON (a || b) FROM ctlt_all
 
 SELECT c.relname, objsubid, description FROM pg_description, pg_index i, pg_class c WHERE classoid = 'pg_class'::regclass AND objoid = i.indexrelid AND c.oid = i.indexrelid AND i.indrelid = 'ctlt_all'::regclass ORDER BY c.relname, objsubid;
     relname     | objsubid | description 
@@ -457,7 +457,7 @@ Check constraints:
     "ctlt1_a_check" CHECK (length(a) > 2)
 Statistics objects:
     "public"."pg_attrdef_a_b_stat" ON a, b FROM public.pg_attrdef
-    "public"."pg_attrdef_expr_stat" ON ((a || b)) FROM public.pg_attrdef
+    "public"."pg_attrdef_expr_stat" ON (a || b) FROM public.pg_attrdef
 
 DROP TABLE public.pg_attrdef;
 -- Check that LIKE isn't confused when new table masks the old, either
@@ -479,7 +479,7 @@ Check constraints:
     "ctlt1_a_check" CHECK (length(a) > 2)
 Statistics objects:
     "ctl_schema"."ctlt1_a_b_stat" ON a, b FROM ctlt1
-    "ctl_schema"."ctlt1_expr_stat" ON ((a || b)) FROM ctlt1
+    "ctl_schema"."ctlt1_expr_stat" ON (a || b) FROM ctlt1
 
 ROLLBACK;
 DROP TABLE ctlt1, ctlt2, ctlt3, ctlt4, ctlt12_storage, ctlt12_comments, ctlt1_inh, ctlt13_inh, ctlt13_like, ctlt_all, ctla, ctlb CASCADE;
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 7fb54de53d..4e52910df2 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -2993,21 +2993,21 @@ insert into stts_t1 select i,i from generate_series(1,100) i;
 analyze stts_t1;
 set search_path to public, stts_s1, stts_s2, tststats;
 \dX
-                                                           List of extended statistics
-  Schema  |          Name          |                               Definition                               | Ndistinct | Dependencies |   MCV   
-----------+------------------------+------------------------------------------------------------------------+-----------+--------------+---------
- public   | func_deps_stat         | ((a * 2)), upper(b), ((c + (1)::numeric)) FROM functional_dependencies |           | defined      | 
- public   | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                          |           |              | defined
- public   | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                            |           |              | defined
- public   | mcv_lists_stats        | a, b, d FROM mcv_lists                                                 |           |              | defined
- public   | stts_1                 | a, b FROM stts_t1                                                      | defined   |              | 
- public   | stts_2                 | a, b FROM stts_t1                                                      | defined   | defined      | 
- public   | stts_3                 | a, b FROM stts_t1                                                      | defined   | defined      | defined
- public   | stts_4                 | b, c FROM stts_t2                                                      | defined   | defined      | defined
- public   | stts_hoge              | col1, col2, col3 FROM stts_t3                                          | defined   | defined      | defined
- stts_s1  | stts_foo               | col1, col2 FROM stts_t3                                                | defined   | defined      | defined
- stts_s2  | stts_yama              | col1, col3 FROM stts_t3                                                |           | defined      | defined
- tststats | priv_test_stats        | a, b FROM priv_test_tbl                                                |           |              | defined
+                                                        List of extended statistics
+  Schema  |          Name          |                            Definition                            | Ndistinct | Dependencies |   MCV   
+----------+------------------------+------------------------------------------------------------------+-----------+--------------+---------
+ public   | func_deps_stat         | (a * 2), upper(b), (c + 1::numeric) FROM functional_dependencies |           | defined      | 
+ public   | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                    |           |              | defined
+ public   | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                      |           |              | defined
+ public   | mcv_lists_stats        | a, b, d FROM mcv_lists                                           |           |              | defined
+ public   | stts_1                 | a, b FROM stts_t1                                                | defined   |              | 
+ public   | stts_2                 | a, b FROM stts_t1                                                | defined   | defined      | 
+ public   | stts_3                 | a, b FROM stts_t1                                                | defined   | defined      | defined
+ public   | stts_4                 | b, c FROM stts_t2                                                | defined   | defined      | defined
+ public   | stts_hoge              | col1, col2, col3 FROM stts_t3                                    | defined   | defined      | defined
+ stts_s1  | stts_foo               | col1, col2 FROM stts_t3                                          | defined   | defined      | defined
+ stts_s2  | stts_yama              | col1, col3 FROM stts_t3                                          |           | defined      | defined
+ tststats | priv_test_stats        | a, b FROM priv_test_tbl                                          |           |              | defined
 (12 rows)
 
 \dX stts_?
@@ -3028,21 +3028,21 @@ set search_path to public, stts_s1, stts_s2, tststats;
 (1 row)
 
 \dX+
-                                                           List of extended statistics
-  Schema  |          Name          |                               Definition                               | Ndistinct | Dependencies |   MCV   
-----------+------------------------+------------------------------------------------------------------------+-----------+--------------+---------
- public   | func_deps_stat         | ((a * 2)), upper(b), ((c + (1)::numeric)) FROM functional_dependencies |           | defined      | 
- public   | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                          |           |              | defined
- public   | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                            |           |              | defined
- public   | mcv_lists_stats        | a, b, d FROM mcv_lists                                                 |           |              | defined
- public   | stts_1                 | a, b FROM stts_t1                                                      | defined   |              | 
- public   | stts_2                 | a, b FROM stts_t1                                                      | defined   | defined      | 
- public   | stts_3                 | a, b FROM stts_t1                                                      | defined   | defined      | defined
- public   | stts_4                 | b, c FROM stts_t2                                                      | defined   | defined      | defined
- public   | stts_hoge              | col1, col2, col3 FROM stts_t3                                          | defined   | defined      | defined
- stts_s1  | stts_foo               | col1, col2 FROM stts_t3                                                | defined   | defined      | defined
- stts_s2  | stts_yama              | col1, col3 FROM stts_t3                                                |           | defined      | defined
- tststats | priv_test_stats        | a, b FROM priv_test_tbl                                                |           |              | defined
+                                                        List of extended statistics
+  Schema  |          Name          |                            Definition                            | Ndistinct | Dependencies |   MCV   
+----------+------------------------+------------------------------------------------------------------+-----------+--------------+---------
+ public   | func_deps_stat         | (a * 2), upper(b), (c + 1::numeric) FROM functional_dependencies |           | defined      | 
+ public   | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                    |           |              | defined
+ public   | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                      |           |              | defined
+ public   | mcv_lists_stats        | a, b, d FROM mcv_lists                                           |           |              | defined
+ public   | stts_1                 | a, b FROM stts_t1                                                | defined   |              | 
+ public   | stts_2                 | a, b FROM stts_t1                                                | defined   | defined      | 
+ public   | stts_3                 | a, b FROM stts_t1                                                | defined   | defined      | defined
+ public   | stts_4                 | b, c FROM stts_t2                                                | defined   | defined      | defined
+ public   | stts_hoge              | col1, col2, col3 FROM stts_t3                                    | defined   | defined      | defined
+ stts_s1  | stts_foo               | col1, col2 FROM stts_t3                                          | defined   | defined      | defined
+ stts_s2  | stts_yama              | col1, col3 FROM stts_t3                                          |           | defined      | defined
+ tststats | priv_test_stats        | a, b FROM priv_test_tbl                                          |           |              | defined
 (12 rows)
 
 \dX+ stts_?
@@ -3071,36 +3071,36 @@ set search_path to public, stts_s1, stts_s2, tststats;
 
 set search_path to public, stts_s1;
 \dX
-                                                          List of extended statistics
- Schema  |          Name          |                               Definition                               | Ndistinct | Dependencies |   MCV   
----------+------------------------+------------------------------------------------------------------------+-----------+--------------+---------
- public  | func_deps_stat         | ((a * 2)), upper(b), ((c + (1)::numeric)) FROM functional_dependencies |           | defined      | 
- public  | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                          |           |              | defined
- public  | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                            |           |              | defined
- public  | mcv_lists_stats        | a, b, d FROM mcv_lists                                                 |           |              | defined
- public  | stts_1                 | a, b FROM stts_t1                                                      | defined   |              | 
- public  | stts_2                 | a, b FROM stts_t1                                                      | defined   | defined      | 
- public  | stts_3                 | a, b FROM stts_t1                                                      | defined   | defined      | defined
- public  | stts_4                 | b, c FROM stts_t2                                                      | defined   | defined      | defined
- public  | stts_hoge              | col1, col2, col3 FROM stts_t3                                          | defined   | defined      | defined
- stts_s1 | stts_foo               | col1, col2 FROM stts_t3                                                | defined   | defined      | defined
+                                                       List of extended statistics
+ Schema  |          Name          |                            Definition                            | Ndistinct | Dependencies |   MCV   
+---------+------------------------+------------------------------------------------------------------+-----------+--------------+---------
+ public  | func_deps_stat         | (a * 2), upper(b), (c + 1::numeric) FROM functional_dependencies |           | defined      | 
+ public  | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                    |           |              | defined
+ public  | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                      |           |              | defined
+ public  | mcv_lists_stats        | a, b, d FROM mcv_lists                                           |           |              | defined
+ public  | stts_1                 | a, b FROM stts_t1                                                | defined   |              | 
+ public  | stts_2                 | a, b FROM stts_t1                                                | defined   | defined      | 
+ public  | stts_3                 | a, b FROM stts_t1                                                | defined   | defined      | defined
+ public  | stts_4                 | b, c FROM stts_t2                                                | defined   | defined      | defined
+ public  | stts_hoge              | col1, col2, col3 FROM stts_t3                                    | defined   | defined      | defined
+ stts_s1 | stts_foo               | col1, col2 FROM stts_t3                                          | defined   | defined      | defined
 (10 rows)
 
 create role regress_stats_ext nosuperuser;
 set role regress_stats_ext;
 \dX
-                                                          List of extended statistics
- Schema |          Name          |                               Definition                               | Ndistinct | Dependencies |   MCV   
---------+------------------------+------------------------------------------------------------------------+-----------+--------------+---------
- public | func_deps_stat         | ((a * 2)), upper(b), ((c + (1)::numeric)) FROM functional_dependencies |           | defined      | 
- public | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                          |           |              | defined
- public | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                            |           |              | defined
- public | mcv_lists_stats        | a, b, d FROM mcv_lists                                                 |           |              | defined
- public | stts_1                 | a, b FROM stts_t1                                                      | defined   |              | 
- public | stts_2                 | a, b FROM stts_t1                                                      | defined   | defined      | 
- public | stts_3                 | a, b FROM stts_t1                                                      | defined   | defined      | defined
- public | stts_4                 | b, c FROM stts_t2                                                      | defined   | defined      | defined
- public | stts_hoge              | col1, col2, col3 FROM stts_t3                                          | defined   | defined      | defined
+                                                       List of extended statistics
+ Schema |          Name          |                            Definition                            | Ndistinct | Dependencies |   MCV   
+--------+------------------------+------------------------------------------------------------------+-----------+--------------+---------
+ public | func_deps_stat         | (a * 2), upper(b), (c + 1::numeric) FROM functional_dependencies |           | defined      | 
+ public | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays                                    |           |              | defined
+ public | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool                                      |           |              | defined
+ public | mcv_lists_stats        | a, b, d FROM mcv_lists                                           |           |              | defined
+ public | stts_1                 | a, b FROM stts_t1                                                | defined   |              | 
+ public | stts_2                 | a, b FROM stts_t1                                                | defined   | defined      | 
+ public | stts_3                 | a, b FROM stts_t1                                                | defined   | defined      | defined
+ public | stts_4                 | b, c FROM stts_t2                                                | defined   | defined      | defined
+ public | stts_hoge              | col1, col2, col3 FROM stts_t3                                    | defined   | defined      | defined
 (9 rows)
 
 reset role;
-- 
2.31.1

>From c367fc69bfbd7e515cf26d4499483b56319f05db Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Mon, 16 Aug 2021 17:19:33 +0200
Subject: [PATCH 2/2] fix: identify single-attribute references

---
 src/backend/commands/statscmds.c        | 23 +++++++++++++++++++++++
 src/bin/pg_dump/t/002_pg_dump.pl        |  2 +-
 src/test/regress/expected/stats_ext.out |  2 ++
 src/test/regress/sql/stats_ext.sql      |  1 +
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 4856f4b41d..63c7529635 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -33,6 +33,7 @@
 #include "optimizer/optimizer.h"
 #include "statistics/statistics.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
@@ -258,6 +259,28 @@ CreateStatistics(CreateStatsStmt *stmt)
 			nattnums++;
 			ReleaseSysCache(atttuple);
 		}
+		else if (IsA(selem->expr, Var))
+		{
+			Var *var = (Var *) selem->expr;
+			TypeCacheEntry *type;
+
+			/* Disallow use of system attributes in extended stats */
+			if (var->varattno <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("statistics creation on system columns is not supported")));
+
+			/* Disallow data types without a less-than operator */
+			type = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+			if (type->lt_opr == InvalidOid)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
+								get_attname(relid, var->varattno, false), format_type_be(var->vartype))));
+
+			attnums[nattnums] = var->varattno;
+			nattnums++;
+		}
 		else					/* expression */
 		{
 			Node	   *expr = selem->expr;
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index a4ee54d516..be1f3a5175 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -2811,7 +2811,7 @@ my %tests = (
 		create_sql   => 'CREATE STATISTICS dump_test.test_ext_stats_expr
 							ON (2 * col1) FROM dump_test.test_fifth_table',
 		regexp => qr/^
-			\QCREATE STATISTICS dump_test.test_ext_stats_expr ON ((2 * col1)) FROM dump_test.test_fifth_table;\E
+			\QCREATE STATISTICS dump_test.test_ext_stats_expr ON (2 * col1) FROM dump_test.test_fifth_table;\E
 		    /xms,
 		like =>
 		  { %full_runs, %dump_test_schema_runs, section_post_data => 1, },
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 4e52910df2..2d7d05057f 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -55,6 +55,8 @@ ERROR:  duplicate expression in statistics definition
 CREATE STATISTICS tst (unrecognized) ON x, y FROM ext_stats_test;
 ERROR:  unrecognized statistics kind "unrecognized"
 -- incorrect expressions
+CREATE STATISTICS tst ON (y) FROM ext_stats_test; -- single column reference
+ERROR:  extended statistics require at least 2 columns
 CREATE STATISTICS tst ON y + z FROM ext_stats_test; -- missing parentheses
 ERROR:  syntax error at or near "+"
 LINE 1: CREATE STATISTICS tst ON y + z FROM ext_stats_test;
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index d563c4654c..9cdc34e82c 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -41,6 +41,7 @@ CREATE STATISTICS tst ON (x || 'x'), (x || 'x'), (y + 1), (x || 'x'), (x || 'x')
 CREATE STATISTICS tst ON (x || 'x'), (x || 'x'), y FROM ext_stats_test;
 CREATE STATISTICS tst (unrecognized) ON x, y FROM ext_stats_test;
 -- incorrect expressions
+CREATE STATISTICS tst ON (y) FROM ext_stats_test; -- single column reference
 CREATE STATISTICS tst ON y + z FROM ext_stats_test; -- missing parentheses
 CREATE STATISTICS tst ON (x, y) FROM ext_stats_test; -- tuple expression
 DROP TABLE ext_stats_test;
-- 
2.31.1

Reply via email to