On Wed, Oct 20, 2010 at 6:03 PM, Josh Berkus <j...@agliodbs.com> wrote: > I also just realized that I confused myself ... we don't really want > more MCVs. What we want it more *samples* to derive a small number of > MCVs. Right now # of samples and number of MCVs is inexorably bound, > and they shouldn't be. On larger tables, you're correct that we don't > necessarily want more MCVs, we just need more samples to figure out > those MCVs accurately.
I don't see why the MCVs would need a particularly large sample size to calculate accurately. Have you done any tests on the accuracy of the MCV list? Robert explained why having more MCVs might be useful because we use the frequency of the least common MCV as an upper bound on the frequency of any value in the MCV. That seems logical but it's all about the number of MCV entries not the accuracy of them. And mostly what it tells me is that we need a robust statistical method and the data structures it requires for estimating the frequency of a single value. Binding the length of the MCV list to the size of the histogram is arbitrary but so would any other value and I haven't seen anyone propose any rationale for any particular value. The only rationale I can see is that we probably want to to take roughly the same amount of space as the existing stats -- and that means we probably want it to be roughly the same size. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers