On Wed, Jul 08, 2009 at 05:23:16PM -0400, Tom Lane wrote: > Noah Misch <n...@leadboat.com> writes: > > The expontential factor seems smaller for real queries. I have a query of > > sixteen joins that takes 71s to plan deterministically; it looks like this: > > > SELECT 1 FROM fact JOIN dim0 ... JOIN dim6 > > JOIN t t0 ON fact.key = t.key AND t.x = MCV0 > > LEFT JOIN t t1 ON fact.key = t.key AND t.x = MCV1 > > JOIN t t2 ON fact.key = t.key AND t.x = MCV2 > > LEFT JOIN t t3 ON fact.key = t.key AND t.x = NON-MCV0 > > LEFT JOIN t t4 ON fact.key = t.key AND t.x = NON-MCV1 > > LEFT JOIN t t5 ON fact.key = t.key AND t.x = NON-MCV2 > > LEFT JOIN t t6 ON fact.key = t.key AND t.x = NON-MCV3 > > LEFT JOIN t t7 ON fact.key = t.key AND t.x = NON-MCV4 > > I'm confused here --- I think you must have over-anonymized your query. > Surely the ON conditions for the left joins should be referencing t3, > t4, etc?
Yes. Aside from that error, I picked the wrong features to emphasize and gloss over. Only two relations (amidst `dim0 ... dim6') resemble dimensions, having an implied relative position on the plan tree. The other fourteen relations share a join key. That explains the distinctive plan cost; this query resembles the utterly unconstrained artificial query to a larger degree than most. > > For the real query, removing one join drops plan time to 26s, and > > removing two drops the time to 11s. I don't have a good theory for > > the multiplier changing from 4 for the trivial demonstration to ~2.5 > > for this real query. > > The rule of thumb that says that an n-way join requires 2^n work is only > true if we consider every single combination of possible joins, which > normally we don't. The planner prefers join paths that follow join > clauses, and will only consider clauseless joins when it has no other > choice. I believe that real queries tend to be pretty non-flat in this > space and so the number of join paths to consider is a lot less than 2^n. > Your synthesized query, on the other hand, allows any relation to be > joined to any other --- it might not look that way, but after creating > derived equalities there will be a potential join clause linking every > relation to every other one. So I think you were testing the worst case, > and I'm not surprised that more-typical queries would show a slower > growth curve. Describing in those terms illuminates much. While the concepts do suggest 2^N worst-case planning cost, my artificial test case showed a rigid 4^N pattern; what could explain that? Thanks, nm
pgpuxZFLTTeZz.pgp
Description: PGP signature