Josh Berkus <josh@agliodbs.com> writes:
> Overall, our formula is inherently conservative of n_distinct.   That is, I 
> believe that it is actually computing the *smallest* number of distinct 
> values which would reasonably produce the given sample, rather than the 
> *median* one.  This is contrary to the notes in analyze.c, which seem to 
> think that we're *overestimating* n_distinct.  

Well, the notes are there because the early tests I ran on that formula
did show it overestimating n_distinct more often than not.  Greg is
correct that this is inherently a hard problem :-(

I have nothing against adopting a different formula, if you can find
something with a comparable amount of math behind it ... but I fear
it'd only shift the failure cases around.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to