>> Why? What if ( somehow ) we knew that there was a 90% chance that >> query would return an empty result set on a big table with 20 non-mcv >> distinct values. Currently the planner would always choose a seq scan, >> where an index scan might be better. > > (1) On what grounds do you assert the above?
For a table with 1000000 non-mcv rows, the planner estimates a result set of cardinality 1000000/20 = 50000, not 1. > (2) What makes you think that an estimate of zero rather than one row > would change the plan? I see where the confusion is coming from. When I said >> What if ( somehow ) we knew that there was a 90% >> chance that query would return an empty result set I meant that the planner doesn't know that information. And how could it? The estimate for ndistinct is an estimate for the number of distinct values in the table, not an estimate for the number of distinct values that will be queried for. My original point was that we sometimes care about the distribution of what's being queried for and not just what's in the table. But this is all silly anyways: if this was really a concern you would write a function if values exist return values else return none > (In fact, I don't think the plan would change, in this case. The reason > for the clamp to 1 row is to avoid foolish results for join situations.) Which makes sense. My point certainly wasn't, in any way, a criticism of clamping selectivity to 1. Cheers, Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers