Viktor Leis wrote: > We have recently performed an experimental evaluation of PostgreSQL's > query optimizer. For example, we measured the contributions of > cardinality estimation and the cost model on the overall query > performance. You can download the resulting paper here: > http://www.vldb.org/pvldb/vol9/p204-leis.pdf > > Some findings: > 1. Perhaps unsurprisingly, we found that cardinality > estimation is the biggest problem in query optimization. > 2. The quality of Postgres' cardinality estimates is not generally worse > than that of the major commerical systems. > 3. It seems to me that one obvious way to avoid many bad situations > would be to disable nested loop joins when the inner relation is NOT > an index scan. > > I hope this will be of interest to some of you.
I have read the paper with great interest, and I have some comments. - The paper mentions that the "Join Order Benchmark" has high cross-table correlation, and this correlation is responsible for bad cardinality estimates that cause bad plans with all RDBMS. Wouldn't it be interesting to do the same experiment with a different real-word data sets to see if that is indeed typical and not an idiosyncrasy of that specific benchmark? - The paper suggests that sampling the base tables is preferable to using statistics because it gives better estimates, but I think that that is only a win with long running, complicated, data warehouse style queries. For the typical OLTP query it would incur intolerable planning times. Any ideas on that? - From my experience in tuning SQL queries I can confirm your one finding, namely that bad cardinality estimates are the prime source for bad plan choices. Perhaps it would be valuable to start thinking about statistics for inter-table correlation. What about something as "simple" as a factor per (joinable) attribute pair that approximates the total row count of a join on these attributes, divided by the planner's estimate? - I also can corroborate your finding that nested loop joins are often harmful, particularly when the inner loop is a sequential scan. One of the first things I do when investigating bad performance of a query whose plan has a nestend loop join is to set enable_nestloop to "off" and see if that makes a difference, and it often does. Maybe it would be a win to bias the planner against nested loop joins. This is dreaming, but it might be nice to have some number as to how reliable a certain estimate is, which is high if the estimate is, say, derived from a single filter on a base table and sinks as more conditions are involved or numbers pulled out of thin air. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers