> Obviously we run into problems when > a) we have a poor estimate for ndistinct - but then we have > worse problems > b) our length measure doesn't correspond well with ndistinct > in an interval
One more problem with low ndistinct values is that the condition might very well hit no rows at all. But Idea 1 will largely overestimate the number of hits. e.g. char(2) field has a histogram bin for 'a1' - 'b1' ndistinct is 2 because actual values in the bin are 'a1' and 'a2'. A query for 'a3' now has a bogus estimate of nrowsperbin / 2. I think for low ndistinct values we will want to know the exact value + counts and not a bin. So I think we will want additional stats rows that represent "value 'a1' stats". Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers