On 8/24/21 3:13 PM, Justin Pryzby wrote:
On Mon, Aug 16, 2021 at 05:41:57PM +0200, Tomas Vondra wrote:
This still seems odd:

postgres=# CREATE STATISTICS asf ON i FROM t;
ERROR:  extended statistics require at least 2 columns
postgres=# CREATE STATISTICS asf ON (i) FROM t;
CREATE STATISTICS

It seems wrong that the command works with added parens, but builds expression
stats on a simple column (which is redundant with what analyze does without
extended stats).

Well, yeah. But I think this is a behavior that was discussed somewhere in
this thread, and the agreement was that it's not worth the complexity, as
this comment explains

   * XXX We do only the bare minimum to separate simple attribute and
   * complex expressions - for example "(a)" will be treated as a complex
   * expression. No matter how elaborate the check is, there'll always be
   * a way around it, if the user is determined (consider e.g. "(a+0)"),
   * so it's not worth protecting against it.

Patch 0001 fixes the "double parens" issue discussed elsewhere in this
thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple
column reference.

0002 refuses to create expressional stats on a simple column reference like
(a), which I think is helps to avoid a user accidentally creating useless ext
stats objects (which are redundant with the table's column stats).

0002 does not attempt to refuse cases like (a+0), which I think is fine:
we don't try to reject useless cases if someone insists on it.
See 240971675, 701fd0bbc.

So I am +1 to apply both patches.

I added this as an Opened Item for increased visibility.


I've pushed both fixes, so the open item should be resolved.

However while polishing the second patch, I realized we're allowing statistics on expressions referencing system attributes. So this fails;

CREATE STATISTICS s ON ctid, x FROM t;

but this passes:

CREATE STATISTICS s ON (ctid::text), x FROM t;

IMO we should reject such expressions, just like we reject direct references to system attributes - patch attached.

Furthermore, I wonder if we should reject expressions without any Vars? This works now:

CREATE STATISTICS s ON (11:text) FROM t;

but it seems rather silly / useless, so maybe we should reject it.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From a3ade67b8b12cdbfa585bf351ca5569599977b41 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Wed, 1 Sep 2021 17:28:21 +0200
Subject: [PATCH] Reject extended statistics referencing system attributes

Extended statistics are not allowed to reference system attributes, but
we only enforced that for simple columns. For expressions, it was
possible to define the statistics on an expression defining the system
attribute, e.g.

    CREATE STATISTICS s ON (ctid::text) FROM t;

which seems strange. This adds a check rejection expressions referencing
system attributes, just like we do for simple columns.

Backpatch to 14, where extended statistics on expressions were added.

Backpath-through: 14
---
 src/backend/commands/statscmds.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 59369f8736..bf01840d8a 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -288,9 +288,24 @@ CreateStatistics(CreateStatsStmt *stmt)
 			Node	   *expr = selem->expr;
 			Oid			atttype;
 			TypeCacheEntry *type;
+			Bitmapset  *attnums = NULL;
+			int			k;
 
 			Assert(expr != NULL);
 
+			/* Disallow expressions referencing system attributes. */
+			pull_varattnos(expr, 1, &attnums);
+
+			k = -1;
+			while ((k = bms_next_member(attnums, k)) >= 0)
+			{
+				AttrNumber	attnum = k + FirstLowInvalidHeapAttributeNumber;
+				if (attnum <= 0)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("statistics creation on system columns is not supported")));
+			}
+
 			/*
 			 * Disallow data types without a less-than operator.
 			 *
-- 
2.31.1

Reply via email to