On 20.01.2011 04:36, Robert Haas wrote:
... Even better, the
code changes would be confined to ANALYZE rather than spread out all
over the system, which has positive implications for robustness and
likelihood of commit.

Keep in mind that the administrator can already override the ndistinct estimate with ALTER TABLE. If he needs to manually run a special ANALYZE command to make it scan the whole table, he might as well just use ALTER TABLE to tell the system what the real (or good enough) value is. A DBA should have a pretty good feeling of what the distribution of his data is like.

And how good does the estimate need to be? For a single-column, it's usually not that critical, because if the column has only a few distinct values then we'll already estimate that pretty well, and OTOH if ndistinct is large, it doesn't usually affect the plans much if it's 10% of the number of rows or 90%.

It seems that the suggested multi-column selectivity estimator would be more sensitive to ndistinct of the individual columns. Is that correct? How is it biased? If we routinely under-estimate ndistinct of individual columns, for example, does the bias accumulate or cancel itself in the multi-column estimate?

I'd like to see some testing of the suggested selectivity estimator with the ndistinct estimates we have. Who knows, maybe it works fine in practice.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to