> Yes, or to phrase that another way: What kinds of queries are being > poorly optimized now and why?
Well, we have two different correlation problems. One is the problem of dependant correlation, such as the 1.0 correlation of ZIP and CITY fields as a common problem. This could in fact be fixed, I believe, via a linear math calculation based on the sampled level of correlation, assuming we have enough samples. And it's really only an issue if the correlation is > 0.5. The second type of correlation issue we have is correlating values in a parent table with *rows* in child table (i.e. FK joins). Currently, the planner assumes that all rows in the child table are evenly distributed against keys in the parent table. But many real-world databases have this kind of problem: A B 1 10000 rows 2 10000 rows 3 1000 rows 4 .. 1000 0 to 1 rows For queries which cover values between 4..1000 on A, the misestimate won't be much of a real execution problem. But for values 1,2,3, the query will bomb. > The other half of this is that bad selectivity estimates only matter > if they're bad enough to change the plan, and I'm not sure whether > cases like this are actually a problem in practice. My experience is that any estimate which is more than 5x wrong (i.e. < .2 or > 5.0) usually causes problems, and 3x sometimes causes problems. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers