Hi,

It is well known that in some instances the Postgresql will make estimates of the number of distinct values in a table that can be quite far off reality. This then has a tendency to make the planner lean towards unsavory plans (read: seqscans) because it estimates the number of lines returned by a part of the request as being quite a lot more than they really are.

The "good" solution would be to fix the estimator, but there has already been long discussions on this topic in the past years and apparently no consensus was found, with alternatives proposed "fixing" some cases where the current estimator is wrong but getting in trouble in others, or requiring quite a bit more CPU/memory/disk I/O to achieve their results (correct me if I'm wrong).

There is a "simple" way to override this, which is to change the value present in pg_statistic, however it will be overwritten the next time ANALYZE (or VACUUM ANALYZE) is run. This thus requires adding updates to this value every time a request that might be fooled by it is executed, which is cumbersome, and does not facilitate updates of this value (especially with positive values of stadistinct).

It seems to me it would be a good idea to be able to store a forced value for stadistinct in pg_attribute (with optionally some clauses to set/change/reset it in CREATE TABLE, ALTER TABLE ADD COLUMN and ALTER TABLE ALTER COLUMN, in a way similar to the STATISTICS clauses).

Alternatively, it could be a simple boolean to just say "don't update stadistinct".

Or did I miss something and this already exists somewhere?

If not, are there any comments or suggestions regarding implementing this?

Thanks,

Jacques.


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to