Sorry for the quick follow-up.

> > This is conceptually similar to vacuumdb --missing-stats-only, but exposed
> > at the SQL ANALYZE level for interactive and scripted use.
>
> What is being proposed in v3 is not similar in behavior.
>
> I am wondering if we should take the current SQL used by vacuumdb to
> find missing stats and perform direct syscache lookups in C?
>
> The benefit there is we can also change vacuumdb to use the ANALYZE
> (MISSING_STATS) option
> directly rather than deriving the SQL, and the MISSING_STATS behavior
> will be the same
> if we do this type of analyze from vacuumdb or manually via ANALYZE command.
>
> Attached is the query used by vacuumdb on HEAD.
>
> What do you think?

After looking at this a bit more, it occurred to me there are 2 routines in
analyze.c and extended_stats.c that are called examine_attribute(), and
their purpose is to check if an attribute should be analyzed and if so, return
a VacAttrStats.

```
/*
* examine_attribute -- pre-analysis of a single column
*
* Determine whether the column is analyzable; if so, create and initialize
* a VacAttrStats struct for it. If not, return NULL.
*/
static VacAttrStats *
examine_attribute(Node *expr)
```

```
/*
* examine_attribute -- pre-analysis of a single column
*
* Determine whether the column is analyzable; if so, create and initialize
* a VacAttrStats struct for it. If not, return NULL.
*
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
static VacAttrStats *
examine_attribute(Relation onerel, int attnum, Node *index_expr)
```

So, we may just need to implement 2 new helpers that check if column
stats or extended
stats are missing from pg_statistic/pg_statistic_ext, and we can then
check the following

1/ if missing_stats_only flag is set to TRUE
2/ and examine_attribute returns VacAttrStats ( attribute is analyzable )
3/ and we do not have stats in pg_statististic or pg_statistic_ext
for the attribute ( the 2 routines can determine this )

The attribute should then be analyzed.

This should also work without much more effort if specific columns are
specified in the
ANALYZE command:

```
analyze table1 (col1, col2);
```

Another comment I have:

> - Whether this behavior and naming align with expectations.

I think the option name should be called MISSING_STATS_ONLY,
which is both clearer in intention and matches the option in vacuumdb.

--
Sami Imseih
Amazon Web Services (AWS)


Reply via email to