On Wed, Feb 18, 2009 at 1:34 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> I'm interested to know whether anyone else shares my belief that
>> nested loops are the cause of most really bad plans.  What usually
>> happens to me is that the planner develops some unwarranted optimism
>> about the number of rows likely to be generated by the outer side of
>> the join and decides that it's not worth sorting the inner side or
>> building a hash table or using an index, and that the right thing to
>> do is just rescan the inner node on every pass.  When the outer side
>> returns three or four orders of magnitude more results than expected,
>> ka-pow!
>
> And then there is the other half of the world, who complain because it
> *didn't* pick a nestloop for some query that would have run in much less
> time if it had.

Well, that's my question: is that really the other half of the world,
or is it the other 5% of the world?  And how does it happen?  In my
experience, most bad plans are caused by bad selectivity estimates,
and the #1 source of bad selectivity estimates is selectivity
estimates for unknown expressions.

(Now it appears that Josh is having problems that are caused by
overestimating the cost of a page fetch, perhaps due to caching
effects.  Those are discussed upthread, and I'm still interested to
see whether we can arrive at any sort of consensus about what might be
a reasonable approach to attacking that problem.  My own experience
has been that this problem is not quite as bad, because it can throw
the cost off by a factor of 5, but not by a factor of 800,000, as in
my example of three unknown expressions with a combined selectivity of
0.1.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to