Thanks for the detailed summary!

It is important to point out that this feature is trying to do 2 distinct
things in 1 command. run analyze under when either one of these conditions
is true:

1/ Table has not been analyzed yet.
2/ Table has been modified.

> Thanks a lot for the detailed feedback — this has been very helpful.Answering 
> to all mails in one.
>
> A few clarifications on intent and scope, and how this relates to the points 
> raised:
>
> Autovacuum overlap
> I agree there is some conceptual overlap with autovacuum’s analyze decision 
> logic.
> The intent here is not to replace or duplicate autovacuum heuristics, but to 
> reduce

Yes, I agree with this.

> I agree that n_mod_since_analyze == 0 is a very simple condition
> and not “smart” in the general sense. That is intentional for now.
> This option is not trying to answer when statistics should be refreshed 
> optimally,
> but only to skip relations that are known to be unchanged since the last 
> analyze.
> If even a single tuple is modified, SMART ANALYZE will still re-run, 
> preserving
> conservative behavior.

Yes, this is my concern. Why would I want to analyze if 1 row or a negligible
amount of rows are modified? I understand that this feature is trying to
keep the decision making very simple, but I think it's too simple to actually
be helpful in addressing the wasted effort of an ANALYZE command.

> Tables never analyzed
> As Christoph and Ilia pointed out earlier, skipping tables that were never 
> analyzed would be incorrect.
> The current logic explicitly avoids that by requiring last_analyze or 
> last_autoanalyze to be present
> before skipping. Tables without prior statistics are always analyzed.

I agree with this, but I think it's more than just tables that have
not been analyzed.
What if a new column is added after the last (auto)analyze. Would we not want to
trigger an analyze in that case?

> Relation to vacuumdb --missing-stats-only
> I agree this is related but slightly different in intent. --missing-stats-only
> answers “does this table have any statistics at all?”, while SMART ANALYZE
> answers “has this table changed since the last statistics collection?”. Both 
> seem
> useful, but they target different use cases. I see SMART ANALYZE primarily
> as a performance optimization for repeated manual ANALYZE runs on 
> mostly-static schemas.

SMART ANALYZE is trying to answer 2 questions "which table does not
have any statistics at all"
and "has this table changed since the last statistics collection?”, right?

So, maybe they need to be 2 separate options.

> Although as sami said this SMART is not smart enough as it should be ,
> I will change name accordingly in the further patches

Yup, I am not too fond of SMART in the name. Also, then name itself
is vague. SKIP_LOCKED and BUFFER_USAGE_LIMIT on the other
hand tell you exactly what they[re used for.

--
Sami Imseih
Amazon Web Services (AWS)


Reply via email to