Thanks for the detailed summary! It is important to point out that this feature is trying to do 2 distinct things in 1 command. run analyze under when either one of these conditions is true:
1/ Table has not been analyzed yet. 2/ Table has been modified. > Thanks a lot for the detailed feedback — this has been very helpful.Answering > to all mails in one. > > A few clarifications on intent and scope, and how this relates to the points > raised: > > Autovacuum overlap > I agree there is some conceptual overlap with autovacuum’s analyze decision > logic. > The intent here is not to replace or duplicate autovacuum heuristics, but to > reduce Yes, I agree with this. > I agree that n_mod_since_analyze == 0 is a very simple condition > and not “smart” in the general sense. That is intentional for now. > This option is not trying to answer when statistics should be refreshed > optimally, > but only to skip relations that are known to be unchanged since the last > analyze. > If even a single tuple is modified, SMART ANALYZE will still re-run, > preserving > conservative behavior. Yes, this is my concern. Why would I want to analyze if 1 row or a negligible amount of rows are modified? I understand that this feature is trying to keep the decision making very simple, but I think it's too simple to actually be helpful in addressing the wasted effort of an ANALYZE command. > Tables never analyzed > As Christoph and Ilia pointed out earlier, skipping tables that were never > analyzed would be incorrect. > The current logic explicitly avoids that by requiring last_analyze or > last_autoanalyze to be present > before skipping. Tables without prior statistics are always analyzed. I agree with this, but I think it's more than just tables that have not been analyzed. What if a new column is added after the last (auto)analyze. Would we not want to trigger an analyze in that case? > Relation to vacuumdb --missing-stats-only > I agree this is related but slightly different in intent. --missing-stats-only > answers “does this table have any statistics at all?”, while SMART ANALYZE > answers “has this table changed since the last statistics collection?”. Both > seem > useful, but they target different use cases. I see SMART ANALYZE primarily > as a performance optimization for repeated manual ANALYZE runs on > mostly-static schemas. SMART ANALYZE is trying to answer 2 questions "which table does not have any statistics at all" and "has this table changed since the last statistics collection?”, right? So, maybe they need to be 2 separate options. > Although as sami said this SMART is not smart enough as it should be , > I will change name accordingly in the further patches Yup, I am not too fond of SMART in the name. Also, then name itself is vague. SKIP_LOCKED and BUFFER_USAGE_LIMIT on the other hand tell you exactly what they[re used for. -- Sami Imseih Amazon Web Services (AWS)
