>> Maybe what should be done about this is to have separate sizes for the >> MCV list and the histogram, where the MCV list is automatically sized >> during ANALYZE.
It's been suggested multiple times that we should base our sample size on a % of the table, or at least offer that as an option. I've pointed out (with math, which Simon wrote a prototype for) that doing block-based sampling instead of random-row sampling would allow us to collect, say, 2% of a very large table without more I/O than we're doing now. Nathan Boley has also shown that we could get tremendously better estimates without additional sampling if our statistics collector recognized common patterns such as normal, linear and geometric distributions. Right now our whole stats system assumes a completely random distribution. So, I think we could easily be quite a bit smarter than just increasing the MCV. Although that might be a nice start. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers