Hi Andreas,

Thank you for raising this — it’s a very good design question.

You’re right that in many practical cases, a user invoking something like
ANALYZE (MODIFIED_STATS) would also want to include relations that
currently have no statistics. From an operational perspective, “missing
stats” and “modified stats” can overlap.

In my earlier prototype, I did attempt to handle both concerns together.
However, during the previous discussion in the thread, it became clear that
combining the semantics made the behavior less predictable and harder to
reason about. That led to splitting the functionality into two more clearly
defined options:

MISSING_STATS_ONLY → analyze relations lacking statistics.

MODIFIED_STATS (proposed) → analyze relations whose statistics may be stale
due to modifications.

The motivation for separation was semantic clarity:

MISSING_STATS_ONLY is catalog-based and persistent (derived from
pg_statistic / pg_statistic_ext).

MODIFIED_STATS would likely depend on modification counters or thresholds
(similar to autoanalyze logic), which are transient and not
crash-persistent.

Keeping them distinct allows each option to have a well-defined and
predictable contract.

That said, your naming suggestion is interesting. A name such as
SKIP_UNMODIFIED does express the behavior from the inverse perspective and
may indeed be clearer. Another possible direction could be:

ANALYZE (MISSING_STATS_ONLY)

ANALYZE (SKIP_UNMODIFIED)

Or potentially allowing both options together, if that proves semantically
consistent.

I’m very open to adjusting the naming and/or semantics if the consensus is
that a combined approach would be more practical.

Thank you again for the thoughtful feedback — it’s very helpful in refining
the direction of this work.

Best regards,
Vasuki M
C-DAC,Chennai

Reply via email to