Hi Andreas, Thank you for raising this — it’s a very good design question.
You’re right that in many practical cases, a user invoking something like ANALYZE (MODIFIED_STATS) would also want to include relations that currently have no statistics. From an operational perspective, “missing stats” and “modified stats” can overlap. In my earlier prototype, I did attempt to handle both concerns together. However, during the previous discussion in the thread, it became clear that combining the semantics made the behavior less predictable and harder to reason about. That led to splitting the functionality into two more clearly defined options: MISSING_STATS_ONLY → analyze relations lacking statistics. MODIFIED_STATS (proposed) → analyze relations whose statistics may be stale due to modifications. The motivation for separation was semantic clarity: MISSING_STATS_ONLY is catalog-based and persistent (derived from pg_statistic / pg_statistic_ext). MODIFIED_STATS would likely depend on modification counters or thresholds (similar to autoanalyze logic), which are transient and not crash-persistent. Keeping them distinct allows each option to have a well-defined and predictable contract. That said, your naming suggestion is interesting. A name such as SKIP_UNMODIFIED does express the behavior from the inverse perspective and may indeed be clearer. Another possible direction could be: ANALYZE (MISSING_STATS_ONLY) ANALYZE (SKIP_UNMODIFIED) Or potentially allowing both options together, if that proves semantically consistent. I’m very open to adjusting the naming and/or semantics if the consensus is that a combined approach would be more practical. Thank you again for the thoughtful feedback — it’s very helpful in refining the direction of this work. Best regards, Vasuki M C-DAC,Chennai
