> Why? Afaict this has been suggested multiple times by people who don't > justify it in any way except with handwavy -- larger samples are > better. The sample size is picked based on what sample statistics > tells us we need to achieve a given 95th percentile confidence > interval for the bucket size given.
I also just realized that I confused myself ... we don't really want more MCVs. What we want it more *samples* to derive a small number of MCVs. Right now # of samples and number of MCVs is inexorably bound, and they shouldn't be. On larger tables, you're correct that we don't necessarily want more MCVs, we just need more samples to figure out those MCVs accurately. > Can you explain when this would and wouldn't bias the sample for the > users so they can decide whether to use it or not? Sure. There's some good math in various ACM papers for this. The basics are that block-based sampling should be accompanied by an increased sample size, or you are lowering your confidence level. But since block-based sampling allows you to increase your sample size without increasing I/O or RAM usage, you *can* take a larger sample ... a *much* larger sample if you have small rows. The algorithms for deriving stats from a block-based sample are a bit more complex, because the code needs to determine the level of physical correlation in the blocks sampled and skew the stats based on that. So there would be an increase in CPU time. As a result, we'd probably give some advice like "random sampling for small tables, block-based for large ones". > I think increasing the MCV is too simplistic since we don't really > have any basis for any particular value. I think what we need are some > statistics nerds to come along and say here's this nice tool from > which you can make the following predictions and understand how > increasing or decreasing the data set size affects the accuracy of the > predictions. Agreed. Nathan? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers