Tom Lane wrote:
Gregory Stark <[EMAIL PROTECTED]> writes:
No, actually the stats table keeps the n most common values and their
frequency (usually in percentage). So really a target of 2 ought to be enough
for boolean values. In fact that's all I see in pg_statistic; I'm assuming
there's a full histogram somewhere but I don't see it. Where would it be?
It's not going to be there. The histogram only covers values that are
not in the most-frequent-values list, and therefore it won't exist for a
column that is completely describable by most-frequent-values.
However the target also dictates how large a sample of the table to take. A
target of two represents a very small sample. So the estimations could be
quite far off.
Right. The real point of stats target for such columns is that it
determines how many rows to sample, and thereby indirectly implies
the accuracy of the statistics. For a heavily skewed boolean column
you'd want a high target so that the number of occurrences of the
infrequent value would be estimated accurately.
It's also worth noting that the number of rows sampled is driven by the
largest per-column stats target in the table, and so reducing stats
target to 2 for a boolean column will save *zero* effort unless all the
columns in the table are booleans.
Thank you all, now I have more clear how it works.
Btw last time I was thinking: why during an explain analyze we can not use
the information on about the really extracted rows vs the extimated rows ?
Now I'm reading an article, written by the same author that ispired the magic "300"
on analyze.c, about "Self-tuning Histograms". If this is implemented, I understood
we can take rid of "vacuum analyze" for mantain up to date the statistics.
Have someone in his plans to implement it ?
After all the idea is simple: compare during normal selects the extimated rows and
the actual extracted rows then use this "free" information to refine the histograms.
Regards
Gaetano Mendola
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]