On Tue, Jul 07, 2009 at 09:31:14AM -0500, Kevin Grittner wrote:
> I don't remember any clear resolution to the wild variations in plan
> time mentioned here:
>  
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg00743.php
>  
> I think it would be prudent to try to figure out why small changes in
> the query caused the large changes in the plan times Andres was
> seeing.  Has anyone else ever seen such behavior?  Can we get
> examples?  (It should be enough to get the statistics and the schema,
> since this is about planning time, not run time.)

With joins between statistically indistinguishable columns, I see planning times
change by a factor of ~4 for each join added or removed (postgres 8.3).  Varying
join_collapse_limit in the neighborhood of the actual number of joins has a
similar effect.  See attachment with annotated timings.  The example uses a
single table joined to itself, but using distinct tables with identical contents
yields the same figures.

The expontential factor seems smaller for real queries.  I have a query of
sixteen joins that takes 71s to plan deterministically; it looks like this:

SELECT 1 FROM fact JOIN dim0 ... JOIN dim6
JOIN t t0 ON fact.key = t.key AND t.x = MCV0
LEFT JOIN t t1 ON fact.key = t.key AND t.x = MCV1
JOIN t t2 ON fact.key = t.key AND t.x = MCV2
LEFT JOIN t t3 ON fact.key = t.key AND t.x = NON-MCV0
LEFT JOIN t t4 ON fact.key = t.key AND t.x = NON-MCV1
LEFT JOIN t t5 ON fact.key = t.key AND t.x = NON-MCV2
LEFT JOIN t t6 ON fact.key = t.key AND t.x = NON-MCV3
LEFT JOIN t t7 ON fact.key = t.key AND t.x = NON-MCV4

For the real query, removing one join drops plan time to 26s, and removing two
drops the time to 11s.  I don't have a good theory for the multiplier changing
from 4 for the trivial demonstration to ~2.5 for this real query.  Re-enabling
geqo drops plan time to .5s.  These tests used default_statistics_target = 1000,
but dropping that to 100 does not change anything dramatically.

> I guess the question is whether there is anyone who has had a contrary
> experience.  (There must have been some benchmarks to justify adding
> geqo at some point?)

I have queries with a few more joins (19-21), and I cancelled attempts to plan
them deterministically after 600+ seconds and 10+ GiB of memory usage.  Even
with geqo_effort = 10, they plan within 5-15s with good results.

All that being said, I've never encountered a situation where a value other than
1 or <inf> for *_collapse_limit appeared optimal.

nm
SET geqo = off;
SET join_collapse_limit = 100;
CREATE TEMP TABLE t AS SELECT * FROM generate_series(1, 1000) f(n); ANALYZE t;

--- Vary join count
-- 242.4s
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10 NATURAL JOIN t t11
NATURAL JOIN t t12;
-- 31.2s
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10 NATURAL JOIN t t11;
-- 8.1s
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
-- 2.0s
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09;
-- 0.5s
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08;

--- Vary join_collapse_limit
-- 8.1s
SET join_collapse_limit = 100;
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
-- 8.0s
SET join_collapse_limit = 11;
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
-- 2.2s
SET join_collapse_limit = 10;
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
-- 0.5s
SET join_collapse_limit = 9;
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
-- 0.1s
SET join_collapse_limit = 8;
EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;

Attachment: pgpODtXjGVDUt.pgp
Description: PGP signature

Reply via email to