Re: [HACKERS] estimating # of distinct values

Heikki Linnakangas Thu, 20 Jan 2011 00:10:54 -0800

On 20.01.2011 04:36, Robert Haas wrote:

... Even better, the
code changes would be confined to ANALYZE rather than spread out all
over the system, which has positive implications for robustness and
likelihood of commit.

Keep in mind that the administrator can already override the ndistinctestimate with ALTER TABLE. If he needs to manually run a special ANALYZEcommand to make it scan the whole table, he might as well just use ALTERTABLE to tell the system what the real (or good enough) value is. A DBAshould have a pretty good feeling of what the distribution of his datais like.

And how good does the estimate need to be? For a single-column, it'susually not that critical, because if the column has only a few distinctvalues then we'll already estimate that pretty well, and OTOH ifndistinct is large, it doesn't usually affect the plans much if it's 10%of the number of rows or 90%.

It seems that the suggested multi-column selectivity estimator would bemore sensitive to ndistinct of the individual columns. Is that correct?How is it biased? If we routinely under-estimate ndistinct of individualcolumns, for example, does the bias accumulate or cancel itself in themulti-column estimate?

I'd like to see some testing of the suggested selectivity estimator withthe ndistinct estimates we have. Who knows, maybe it works fine in practice.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

Reply via email to