Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

Nathan Boley Tue, 10 Jun 2008 14:31:44 -0700

>> Why?  What if ( somehow ) we knew that there was a 90% chance that
>> query would return an empty result set on a big table with 20 non-mcv
>> distinct values. Currently the planner would always choose a seq scan,
>> where an index scan might be better.
>
> (1) On what grounds do you assert the above?


For a table with 1000000 non-mcv rows, the planner estimates a result
set of cardinality 1000000/20 = 50000, not 1.

> (2) What makes you think that an estimate of zero rather than one row
> would change the plan?

I see where the confusion is coming from. When I said

>> What if ( somehow ) we knew that there was a 90%
>> chance that query would return an empty result set

I meant that the planner doesn't know that information. And how could it?

The estimate for ndistinct is an estimate for the number of distinct
values in the table, not an estimate for the number of distinct values
that will be queried for.  My original point was that we sometimes
care about the distribution of what's being queried for and not just
what's in the table.

But this is all silly anyways: if this was really a concern you would
write a function

if values exist
   return values
else return none

> (In fact, I don't think the plan would change, in this case.  The reason
> for the clamp to 1 row is to avoid foolish results for join situations.)

Which makes sense. My point certainly wasn't, in any way, a criticism
of clamping selectivity to 1.

Cheers,
Nathan

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

Reply via email to