On Apr 1, 2016 23:14, "Tom Lane" <t...@sss.pgh.pa.us> wrote: > > "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes: > > Alright. I'm attaching the latest version of this patch split in two > > parts: the first one is NULLs-related bugfix and the second is the > > "improvement" part, which applies on top of the first one. > > I've applied the first of these patches,
Great news, thank you! > broken into two parts first > because it seemed like there were two issues and second because Tomas > deserved primary credit for one part, ie realizing we were using the > Haas-Stokes formula wrong. > > As for the other part, I committed it with one non-cosmetic change: > I do not think it is right to omit "too wide" values when considering > the threshold for MCVs. As submitted, the patch was inconsistent on > that point anyway since it did it differently in compute_distinct_stats > and compute_scalar_stats. But the larger picture here is that we define > the MCV population to exclude nulls, so it's reasonable to consider a > value as an MCV even if it's greatly outnumbered by nulls. There is > no such exclusion for "too wide" values; those things are just an > implementation limitation in analyze.c, not something that is part of > the pg_statistic definition. If there are a lot of "too wide" values > in the sample, we don't know whether any of them are duplicates, but > we do know that the frequencies of the normal-width values have to be > discounted appropriately. Okay. > Haven't looked at 0002 yet. [crosses fingers] hope you'll have a chance to do that before feature freeze for 9.6… -- Alex