Hi all,
I am back with a new patch.
As a follow-up to the previous discussion, I’m posting a revised v3 patch
that introduces an opt-in ANALYZE (MISSING_STATS) option.
The goal is to reduce unnecessary work when running manual ANALYZE over
many relations, while keeping the default behavior unchanged.
Overview:
ANALYZE (MISSING_STATS) analyzes only relations that currently have no
statistics entries in pg_statistic. Relations that already have statistics
are skipped.
This is conceptually similar to vacuumdb --missing-stats-only, but exposed
at the SQL ANALYZE level for interactive and scripted use.
The decision is intentionally table-level and conservative:
- If a relation has at least one pg_statistic entry, it is considered to
“have stats” and may be skipped.
- If no pg_statistic rows exist (new table, stats removed, crash reset),
the relation is analyzed.
No thresholds or modification counters are used in this option.
Behavior summary for ANALYZE(MISSING_STATS);
New empty table------------------------------->analyzed
Re-run on empty table------------------------> analyzed
Table with data but no stats-----------------> analyzed
Re-run after stats exist-----------------------> skipped
Add new column after ANALYZE----------> analyzed
Re-run after column stats exist-------------> skipped
Statistics manually deleted (pg_statistic) -> analyzed
Statistics lost after crash recovery-----------> analyzed
Regular ANALYZE -------------------------------> unchanged behavior
This ensures that ANALYZE (MISSING_STATS) converges toward a no-op on
subsequent runs, while still recovering from missing or invalid statistics.
Scope and limitations:
- Applies only to regular relations.
- Uses pg_statistic directly (not pg_stats or pg_stat views).
- Does not consider modification thresholds or autovacuum heuristics.
- Partitioned tables, inheritance, and extended statistics are not handled
yet and can be considered separately.
I would appreciate feedback on:
- Whether this behavior and naming align with expectations.
- Any edge cases I may have missed.
- Whether this is a reasonable first step before considering more advanced
options (e.g., modified-stats thresholds).
While testing i have noted this :
analyze_test=# ALTER TABLE ms1 ADD COLUMN b int;
ALTER TABLE
Time: 44.665 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG: ANALYZE processing relation "ms1" (OID 32791)
analyze_test=# SELECT attname
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1'
ORDER BY attname;
attname
---------
a
b
(2 rows)
Time: 1.390 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG: ANALYZE (MISSING_STATS): skipping relation "ms1" (OID 32791)
analyze_test=# SELECT
a.attname,
s.stanullfrac,
s.stadistinct
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1';
attname | stanullfrac | stadistinct
---------+-------------+-------------
a | 0 | -1
b | 1 | 0
(2 rows)
Time: 0.733 ms
Note:
[1] The empty table[if a table has 0 rows] --> no pg_statistic rows at all
[2] If a table has >= 1 row then postgres creates pg_statistic rows for
every user column,even if the
- Column is entirely null
- The column was added later
- The column has never had a non-null values
Thanks for your time and review, I will post the next patch for
modified_stats shortly.
*Vasuki MC-DAC,Chennai*
From 1ad50bbefbab0fa1e75aa696187066e1b11896f4 Mon Sep 17 00:00:00 2001
From: Vasuki M <[email protected]>
Date: Thu, 29 Jan 2026 17:25:26 +0530
Subject: [PATCH] ANALYZE: add MISSING_STATS option to skip relations with
existing statistics
Introduce an opt-in ANALYZE (MISSING_STATS) mode that analyzes only
relations which currently have no entries in pg_statistic.
This allows repeated manual ANALYZE runs to avoid unnecessary work on
relations whose statistics already exist, while still analyzing:
- newly created tables,
- tables that have gained data but lack statistics,
- relations whose statistics were lost or reset.
The default ANALYZE behavior is unchanged.
Regression tests are included.
---
src/backend/commands/analyze.c | 48 +++++++++++++++++
src/backend/commands/vacuum.c | 6 ++-
src/include/commands/vacuum.h | 1 +
.../expected/analyze_missing_stats.out | 54 +++++++++++++++++++
src/test/regress/parallel_schedule | 1 +
.../regress/sql/analyze_missing_stats.sql | 52 ++++++++++++++++++
6 files changed, 161 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/analyze_missing_stats.out
create mode 100644 src/test/regress/sql/analyze_missing_stats.sql
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a4834241..9d846cf2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -46,6 +46,7 @@
#include "storage/procarray.h"
#include "utils/attoptcache.h"
#include "utils/datum.h"
+#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -140,6 +141,53 @@ analyze_rel(Oid relid, RangeVar *relation,
onerel = vacuum_open_relation(relid, relation, params.options & ~(VACOPT_VACUUM),
params.log_analyze_min_duration >= 0,
ShareUpdateExclusiveLock);
+ /*
+ * ANALYZE (MISSING_STATS):
+ * Skip relations that already have statistics in pg_statistic.
+ * This is table-level, not column-level.
+ */
+ if ((params.options & VACOPT_MISSING_STATS) &&
+ onerel->rd_rel->relkind == RELKIND_RELATION)
+ {
+ Relation pgstatistic;
+ SysScanDesc scan;
+ ScanKeyData key;
+ HeapTuple tup;
+ bool has_stats = false;
+
+ pgstatistic = table_open(StatisticRelationId, AccessShareLock);
+
+ ScanKeyInit(&key,
+ Anum_pg_statistic_starelid,
+ BTEqualStrategyNumber,
+ F_OIDEQ,
+ ObjectIdGetDatum(RelationGetRelid(onerel)));
+
+ scan = systable_beginscan(pgstatistic,
+ InvalidOid,
+ false,
+ NULL,
+ 1,
+ &key);
+
+ tup = systable_getnext(scan);
+ if (HeapTupleIsValid(tup))
+ has_stats = true;
+
+ systable_endscan(scan);
+ table_close(pgstatistic, AccessShareLock);
+
+ if (has_stats)
+ {
+ elog(DEBUG1,
+ "ANALYZE (MISSING_STATS): skipping relation \"%s\"",
+ RelationGetRelationName(onerel));
+
+ table_close(onerel, ShareUpdateExclusiveLock);
+ return;
+ }
+ }
+
/* leave if relation could not be opened or locked */
if (!onerel)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 03932f45..3cf3f210 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -165,6 +165,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
BufferAccessStrategy bstrategy = NULL;
bool verbose = false;
bool skip_locked = false;
+ bool missing_stats = false;
bool analyze = false;
bool freeze = false;
bool full = false;
@@ -229,6 +230,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
ring_size = result;
}
+ else if (strcmp(opt->defname, "missing_stats") == 0)
+ missing_stats = defGetBoolean(opt);
else if (!vacstmt->is_vacuumcmd)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -305,6 +308,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
(vacstmt->is_vacuumcmd ? VACOPT_VACUUM : VACOPT_ANALYZE) |
(verbose ? VACOPT_VERBOSE : 0) |
(skip_locked ? VACOPT_SKIP_LOCKED : 0) |
+ (missing_stats ? VACOPT_MISSING_STATS :0) |
(analyze ? VACOPT_ANALYZE : 0) |
(freeze ? VACOPT_FREEZE : 0) |
(full ? VACOPT_FULL : 0) |
@@ -315,7 +319,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0);
/* sanity checks on options */
- Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE));
+ Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE | VACOPT_MISSING_STATS));
Assert((params.options & VACOPT_VACUUM) ||
!(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index e885a4b9..55c87958 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -188,6 +188,7 @@ typedef struct VacAttrStats
#define VACOPT_DISABLE_PAGE_SKIPPING 0x100 /* don't skip any pages */
#define VACOPT_SKIP_DATABASE_STATS 0x200 /* skip vac_update_datfrozenxid() */
#define VACOPT_ONLY_DATABASE_STATS 0x400 /* only vac_update_datfrozenxid() */
+#define VACOPT_MISSING_STATS 0x08 /* ANALYZE if stats are missing */
/*
* Values used by index_cleanup and truncate params.
diff --git a/src/test/regress/expected/analyze_missing_stats.out b/src/test/regress/expected/analyze_missing_stats.out
new file mode 100644
index 00000000..b39a82ca
--- /dev/null
+++ b/src/test/regress/expected/analyze_missing_stats.out
@@ -0,0 +1,54 @@
+--
+-- ANALYZE (MISSING_STATS) regression test
+--
+-- 1. Brand new empty table
+CREATE TABLE ms1 (a int);
+-- No statistics yet → should analyze
+ANALYZE (MISSING_STATS);
+-- pg_statistic should still be empty (table has no rows)
+SELECT EXISTS (
+ SELECT 1
+ FROM pg_statistic
+ WHERE starelid = 'ms1'::regclass
+) AS has_stats;
+ has_stats
+-----------
+ f
+(1 row)
+
+-- 2. Re-run → still no stats → should analyze again
+ANALYZE (MISSING_STATS);
+-- 3. Insert data
+INSERT INTO ms1 SELECT generate_series(1,10);
+-- Stats still missing → should analyze
+ANALYZE (MISSING_STATS);
+-- Now stats must exist
+SELECT EXISTS (
+ SELECT 1
+ FROM pg_statistic
+ WHERE starelid = 'ms1'::regclass
+) AS has_stats;
+ has_stats
+-----------
+ t
+(1 row)
+
+-- 4. Re-run → stats exist → should skip
+ANALYZE (MISSING_STATS);
+-- 5. Simulate stats loss (crash/reset)
+DELETE FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+-- Stats missing again → should analyze
+ANALYZE (MISSING_STATS);
+-- Stats restored
+SELECT EXISTS (
+ SELECT 1
+ FROM pg_statistic
+ WHERE starelid = 'ms1'::regclass
+) AS has_stats;
+ has_stats
+-----------
+ t
+(1 row)
+
+DROP TABLE ms1;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 021d57f6..001def94 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -90,6 +90,7 @@ test: rules psql psql_crosstab psql_pipeline amutils stats_ext collate.linux.utf
test: select_parallel
test: write_parallel
test: vacuum_parallel
+test: analyze_missing_stats
# Run this alone, because concurrent DROP TABLE would make non-superuser
# "ANALYZE;" fail with "relation with OID $n does not exist".
diff --git a/src/test/regress/sql/analyze_missing_stats.sql b/src/test/regress/sql/analyze_missing_stats.sql
new file mode 100644
index 00000000..9457ff04
--- /dev/null
+++ b/src/test/regress/sql/analyze_missing_stats.sql
@@ -0,0 +1,52 @@
+--
+-- ANALYZE (MISSING_STATS) regression test
+--
+
+-- 1. Brand new empty table
+CREATE TABLE ms1 (a int);
+
+-- No statistics yet → should analyze
+ANALYZE (MISSING_STATS);
+
+-- pg_statistic should still be empty (table has no rows)
+SELECT EXISTS (
+ SELECT 1
+ FROM pg_statistic
+ WHERE starelid = 'ms1'::regclass
+) AS has_stats;
+
+-- 2. Re-run → still no stats → should analyze again
+ANALYZE (MISSING_STATS);
+
+-- 3. Insert data
+INSERT INTO ms1 SELECT generate_series(1,10);
+
+-- Stats still missing → should analyze
+ANALYZE (MISSING_STATS);
+
+-- Now stats must exist
+SELECT EXISTS (
+ SELECT 1
+ FROM pg_statistic
+ WHERE starelid = 'ms1'::regclass
+) AS has_stats;
+
+-- 4. Re-run → stats exist → should skip
+ANALYZE (MISSING_STATS);
+
+-- 5. Simulate stats loss (crash/reset)
+DELETE FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+
+-- Stats missing again → should analyze
+ANALYZE (MISSING_STATS);
+
+-- Stats restored
+SELECT EXISTS (
+ SELECT 1
+ FROM pg_statistic
+ WHERE starelid = 'ms1'::regclass
+) AS has_stats;
+
+DROP TABLE ms1;
+
--
2.43.0