It seems to me that the root cause of some of the optimizer failures that come
is the optimizer's attempt to assign a single "expected cost" value to every
choice. In fact it seems it should have also a "minimum cost" and "maximum
cost" in addition to the "expected cost". Often the optimizer is faced with
two possible choices, one of which appears slightly better but in fact has a
much larger failure mode than the alternative.

For example, consider a simple join between two tables where the optimizer
thinks approximately 10% of the second table will be used. It will probably be
just at the threshold of using a full table scan with a merge or hash join
instead of the simple nested loop.

In fact the nested loop has only a small linear penalty (2-4 times slower even
if the *entire* table is read) if it's mistakenly chosen. Whereas if the
selectivity is estimated wrong and only a few records are needed the full
table scan can be thousands of times slower than the nested loop.

If the nested loop calculated the min/max/expected costs based on 1 row, the
full table, and the expected number of records and found that while the
expected value is slightly higher than the equivalent for the merge join, the
max is 2x higher than the merge join but the minimum is thousands of times
smaller, then it should consider choosing the nested loop because of the
greater risk of choosing the merge join.


-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to