Sorry for the quick follow-up. > > This is conceptually similar to vacuumdb --missing-stats-only, but exposed > > at the SQL ANALYZE level for interactive and scripted use. > > What is being proposed in v3 is not similar in behavior. > > I am wondering if we should take the current SQL used by vacuumdb to > find missing stats and perform direct syscache lookups in C? > > The benefit there is we can also change vacuumdb to use the ANALYZE > (MISSING_STATS) option > directly rather than deriving the SQL, and the MISSING_STATS behavior > will be the same > if we do this type of analyze from vacuumdb or manually via ANALYZE command. > > Attached is the query used by vacuumdb on HEAD. > > What do you think?
After looking at this a bit more, it occurred to me there are 2 routines in analyze.c and extended_stats.c that are called examine_attribute(), and their purpose is to check if an attribute should be analyzed and if so, return a VacAttrStats. ``` /* * examine_attribute -- pre-analysis of a single column * * Determine whether the column is analyzable; if so, create and initialize * a VacAttrStats struct for it. If not, return NULL. */ static VacAttrStats * examine_attribute(Node *expr) ``` ``` /* * examine_attribute -- pre-analysis of a single column * * Determine whether the column is analyzable; if so, create and initialize * a VacAttrStats struct for it. If not, return NULL. * * If index_expr isn't NULL, then we're trying to analyze an expression index, * and index_expr is the expression tree representing the column's data. */ static VacAttrStats * examine_attribute(Relation onerel, int attnum, Node *index_expr) ``` So, we may just need to implement 2 new helpers that check if column stats or extended stats are missing from pg_statistic/pg_statistic_ext, and we can then check the following 1/ if missing_stats_only flag is set to TRUE 2/ and examine_attribute returns VacAttrStats ( attribute is analyzable ) 3/ and we do not have stats in pg_statististic or pg_statistic_ext for the attribute ( the 2 routines can determine this ) The attribute should then be analyzed. This should also work without much more effort if specific columns are specified in the ANALYZE command: ``` analyze table1 (col1, col2); ``` Another comment I have: > - Whether this behavior and naming align with expectations. I think the option name should be called MISSING_STATS_ONLY, which is both clearer in intention and matches the option in vacuumdb. -- Sami Imseih Amazon Web Services (AWS)
