Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

2025-07-18 Thread Tom Lane
I wrote:
> Well, we don't have a most common element in this scenario --- the
> whole point is that the occurrence counts resulting from the lossy
> counting algorithm are too low to be trustworthy.  However, what we
> do have is the cutoff frequency, and it seems to me that we could use
> that as the estimate of the maximum frequency of the non-MCEs.

Here's a less crude patch for that.  The array_typanalyze.c changes
are the same as before, but I reconsidered what to do about this
stumbling block:

> Assuming we like this direction, the main thing that makes this a hack
> not a polished patch is that I had to strongarm the code into storing
> a zero-length values array.  What update_attstats would normally do
> is leave the values column of the MCELEM stats slot NULL, which then
> causes get_attstatsslot to throw a that-shouldn't-be-null error.
> An alternative answer is to change get_attstatsslot to allow a null,
> but I'm not sure that that's any cleaner.  Either way it seems like
> there's a hazard of breaking some code that isn't expecting the case.

I concluded that letting get_attstatsslot accept nulls is too risky;
there is probably code that assumes that get_attstatsslot would've
rejected that.  Instead, this version changes update_attstats' rule
for when to store an array rather than a null.  Now it will do so
if the passed-in stavalues pointer isn't null, even if numvalues
is zero.  I think that this doesn't change the outcome for any
existing analyze routines because they wouldn't pass a data pointer
if they have no values; and even if it does, storing an empty array
not a null seems like it should be pretty harmless.

> An alternative that feels cleaner but a good bit more invasive
> is to get rid of the convention of storing the min/max/nulls
> frequencies as extra entries in the MCELEM numbers entry ---
> which surely is a hack too --- and put them into some new slot
> type.  I'm not sure if that's enough nicer to be worth the
> conversion pain.

A year ago I would have seriously considered doing it that way.
But now that we have code to dump-n-restore stats, that code would
have to be adjusted to convert the old representation.  It's not
worth it for this case.

Hence, v1 attached, now with a commit message.

regards, tom lane

From ac1f22903594ce3613d615fe5fba09c77f216769 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Fri, 18 Jul 2025 17:43:21 -0400
Subject: [PATCH v1] Track the maximum possible frequency of non-MCE array
 elements.

The lossy-counting algorithm that ANALYZE uses to identify
most-common array elements may decide that none of the values
are common enough to be called MCEs.  In such cases we stored
nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which
resulted in array_selfuncs.c falling back to default estimates.
However, we do in fact have valuable information to convey:
we know that none of the array elements are as common as the
cutoff frequency that lossy counting reported.  If we report
that as the minimum MCE frequency, array_selfuncs.c will
arrive at significantly-better-than-default estimates.

So we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

Reported-by: Mark Frost 
Author: Tom Lane 
Discussion: https://postgr.es/m/ph3ppf1c905d6e6f24a5c1a1a1d8345b593e1...@ph3ppf1c905d6e6.namprd15.prod.outlook.com
---
 src/backend/commands/analyze.c   |  7 +++
 src/backend/utils/adt/array_typanalyze.c | 15 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7111d5d5334..a47e23eaa8c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1712,10 +1712,9 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		i = Anum_pg_statistic_stanumbers1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
-			int			nnum = stats->numnumbers[k];
-
-			if (nnum > 0)
+			if (stats->stanumbers[k] != NULL)
 			{
+int			nnum = stats->numnumbers[k];
 Datum	   *numdatums = (Datum *) palloc(nnum * sizeof(Datum));
 ArrayType  *arry;
 
@@ -1733,7 +1732,7 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		i = Anum_pg_statistic_stavalues1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{

Re: Is there a way to identify a plan generated by GECO?

2025-07-18 Thread Jerry Brenner
We just jdbc and bind variables, so we are using PreparedStatements.
plan_cache_mode is set to auto

So, it sounds like there could be plan caching. (I wasn't aware of that.)
Is there any kind of running counter in a system view that tracks the
number of executions of cached plans?
We are capturing the plans via auto_explain and limited to the explain
options available with that path. Is there anything in the plan that would
tell us if the execution used a cached plan? (My manual explains does not
use a prepare.)

Thanks, Jerry

On Thu, Jul 17, 2025 at 8:10 PM Tom Lane  wrote:

> Jerry Brenner  writes:
> > I don't have any background with the randomized search.  Does the
> repeated
> > pattern with the same plan being executed multiple times in a time range
> > and then the plan changes, never to change back, match the expectation
> with
> > the randomization?
>
> [ shrug... ]  Insufficient information.  There could be some plan
> caching going on that contributes to this effect, though.
>
> regards, tom lane
>
>