Re: [HACKERS] Cross-column statistics revisited

Greg Stark Thu, 16 Oct 2008 10:32:56 -0700

[sorry for top osting - dam phone]

It's pretty straightforward to to a chi-squared test on all the pairs.But that tells you that the product is more likely to be wrong. Itdoesn't tell you whether it's going to be too high or too low...


greg

On 16 Oct 2008, at 07:20 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

Martijn van Oosterhout <[EMAIL PROTECTED]> writes:

I think you need to go a step back: how are you going to use thisdata?


The fundamental issue as the planner sees it is not having to assume
independence of WHERE clauses.  For instance, given

   WHERE a < 5 AND b > 10

our current approach is to estimate the fraction of rows with a < 5
(using stats for a), likewise estimate the fraction with b > 10
(using stats for b), and then multiply these fractions together.
This is correct if a and b are independent, but can be very bad if
they aren't.  So if we had joint statistics on a and b, we'd want to
somehow match that up to clauses for a and b and properly derive
the joint probability.

(I'm not certain of how to do that efficiently, even if we had the
right stats :-()

           regards, tom lane

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cross-column statistics revisited

Reply via email to