Viktor Leis wrote:
> We have recently performed an experimental evaluation of PostgreSQL's
> query optimizer. For example, we measured the contributions of
> cardinality estimation and the cost model on the overall query
> performance. You can download the resulting paper here:
> http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> 
> Some findings:
> 1. Perhaps unsurprisingly, we found that cardinality
> estimation is the biggest problem in query optimization.
> 2. The quality of Postgres' cardinality estimates is not generally worse
> than that of the major commerical systems.
> 3. It seems to me that one obvious way to avoid many bad situations
> would be to disable nested loop joins when the inner relation is NOT
> an index scan.
> 
> I hope this will be of interest to some of you.

I have read the paper with great interest, and I have some comments.

- The paper mentions that the "Join Order Benchmark" has high cross-table
  correlation, and this correlation is responsible for bad cardinality
  estimates that cause bad plans with all RDBMS.
  Wouldn't it be interesting to do the same experiment with a different
  real-word data sets to see if that is indeed typical and not an
  idiosyncrasy of that specific benchmark?

- The paper suggests that sampling the base tables is preferable to
  using statistics because it gives better estimates, but I think that that
  is only a win with long running, complicated, data warehouse style queries.
  For the typical OLTP query it would incur intolerable planning times.
  Any ideas on that?

- From my experience in tuning SQL queries I can confirm your one finding,
  namely that bad cardinality estimates are the prime source for bad
  plan choices.
  Perhaps it would be valuable to start thinking about statistics for
  inter-table correlation. What about something as "simple" as a factor
  per (joinable) attribute pair that approximates the total row count
  of a join on these attributes, divided by the planner's estimate?

- I also can corroborate your finding that nested loop joins are often
  harmful, particularly when the inner loop is a sequential scan.
  One of the first things I do when investigating bad performance of a query
  whose plan has a nestend loop join is to set enable_nestloop to "off"
  and see if that makes a difference, and it often does.
  Maybe it would be a win to bias the planner against nested loop joins.
  This is dreaming, but it might be nice to have some number as to how
  reliable a certain estimate is, which is high if the estimate is, say,
  derived from a single filter on a base table and sinks as more conditions
  are involved or numbers pulled out of thin air.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to