[SQL] Optimizing Multiply Joins ???
Hi all, We are building a sophisticated and flexible database structure and thus, we have quite complicated and longish queries containing lots of joins. Using only a few test records in our structure we have performed some measures, and it is hard to interpret the results. Until we join no more than 10 tables the response time is below 0.2 s. joining the 11th table comes with a dramatic change: response time usually grows up to 5-7 s, I'we read the related pages of the documentation, and found the description of the default and the genetic optimizer too. And also found the story about the german knowledge-based system project where longer queries than 10 joins were also too slow. But I think (hope) we could have a solution, because all of our complex joins are following foreign keys. If we could give some hints to the planner about foreign keys, it should not generate plenty of unusable plans for the optimizer. Here I send an example: the query: select h.literal as division, j.literal as soatype, e.username, e.password, c.objectid from o_division as a join o_soa as b on b.divisionobjectid=a.objectid join o_soainstance as c on c.soaobjectid=b.objectid join o_staff_rdl_soainstance_role_ as d on d.soainstanceobjectid=c.objectid join o_electronic as e on e.pointerobjectid=d.objectid join o_soatype as f on f.objectid=b.soatypeobjectid join o_meaning as g on g.objectid=a.name join o_meaning_rndl_language_role_ as h on h.meaningobjectid=g.objectid and h.languageobjectid=11 join o_meaning as i on i.objectid=f.name join o_meaning_rndl_language_role_ as j on j.meaningobjectid=i.objectid and j.languageobjectid=11 join o_staff as k on k.objectid=d.staffobjectid join o_externalcontributor as l on l.pointerobjectid=k.objectid the structure behind it: [the arrows are representing the foreign keys.] a -> g <- h ^ | b -> f -> i <- j ^ | c ^ | d -> k <- l ^ | e results of this query: join from a to j takes 0.2 s a to k takes 4.8 s a to l takes 5.2 s I have examined the output of explain in all 3 cases, and I have the feeling that the planner simply forgets the best solutions in 2nd and 3rd case. If this is not enough info for the answer I can send the tables, their contents, the output of the optimizer. or whatever you need for the answer (including beer :) Attila
Re: [SQL] Optimizing Multiply Joins ???
Hi, > That's interesting; apparently the regular optimizer is faster than the > GEQO optimizer for your style of query. Try increasing the GEQO > threshold (pg_option "geqo_rels") to more than 11. Changing this option in a psql session with 'set' has really helped Thanks. the results have changed this way: nr tables plain(s): more indices(2): geqo off(s): 10: 0.2 0.3 0.3 11: 4.8 5.9 0.8 12: 5.2 7.0 1.8 But it seems, geqo_rels option is not parsed from the pg_option file. > Actually, as the 7.1 code currently stands, a query that uses explicit > JOIN operators like yours does will always be implemented in exactly > the given join order, with no searching. I haven't quite decided if > that's a bug or a feature ... Great !! If possible, it should be left in the codebase at least as an option later... Because we could have dozens(hundreds) of preoptimized queries. Attila
Re: [SQL] Optimizing Multiply Joins ???
Hi, > Actually, as the 7.1 code currently stands, a query that uses explicit > JOIN operators like yours does will always be implemented in exactly > the given join order, with no searching. I haven't quite decided if > that's a bug or a feature ... Do you mean a "linear binary tree" like this is executed? /\ /\ f /\ e /\ d /\ c a b Or can we have some variations on the graph (with the same preorder run: a,b,c,d,e,f) like this: /\ / \ / /\ /\ d /\ a /\ e f b c We could give hints on the joins with brackets this way: ((a,(b,c)),(d,(e,f))) I'm not sure which version of standards allows to bracket joins, but I know sybase accepts the above form. How difficult it looks to hack the parser to accept this form, and pass the meaning to the planner? Attila
Re: [SQL] Optimizing Multiply Joins ???
Hi, > If that's what you write, yes. You can parenthesize the JOIN clauses > any way you like, though, and the 7.1 planner will follow that structure. Thanks for the exhaustive answer. Can I test this feature in the current snapshot? Attila
[SQL] Subqueries in from clause?
Hi All, My question for this week: How far is the above subject from beeing implemented? (I've run through the archives, and found some quite old mails in this topic. They mentioned automatically generated temp tables as possibe solutions... Is this the plan even today?) Attila Ps.: Two weeks ago I had a question about multiplee joins. According to Tom's advice I translated the actual snapshot from cvs and after submitting the query with parenthesis on the right place I experienced a speedup factor of 200 (two hundred!!!) comapred to the GEQO. Great. Ps^2.: Also tested tuple toasting up to 4Mbytes of single tuples... I like it :))
[SQL] "AND", "OR" and Materialize :((((
Hi all, We've got the following 3 tables and 2 simple queries. The only difference lies in the join condition: the first uses OR, the second uses AND. I expected some difference in the performace according to the difference in the evaluation of the logical form, but not 3 magnitudes !!! So the question is: WHY SHALL IT MATERIALIZE A CONSTANT RESULT IN A LOOP OF 12 TIMES ?? [it would be enough to materialize only once, or even never, because the size of the materialized table is not larger than 1Mb... ] ps.: there are indeces on all referenced fields. atti=# explain select count(*) from _108 left join (_111 cross join _110) on (_108.objectid=_111._108objectid OR _108.objectid=_110._108objectid); NOTICE: QUERY PLAN: Aggregate (cost=5017202.06..5017202.06 rows=1 width=24) -> Nested Loop (cost=0.00..5016900.05 rows=120806 width=24) -> Seq Scan on _108 (cost=0.00..44.70 rows=1670 width=8) -> Materialize (cost=2097.79..2097.79 rows=60421 width=16) -> Nested Loop (cost=0.00..2097.79 rows=60421 width=16) -> Seq Scan on _110 (cost=0.00..1.37 rows=37 width=8) -> Seq Scan on _111 (cost=0.00..40.33 rows=1633 width=8) EXPLAIN atti=# explain select count(*) from _108 left join (_111 cross join _110) on _108.objectid=_111._108objectid AND _108.objectid=_110._108objectid; NOTICE: QUERY PLAN: Aggregate (cost=7965.68..7965.68 rows=1 width=24) -> Merge Join (cost=7030.14..7961.51 rows=1670 width=24) -> Sort (cost=134.09..134.09 rows=1670 width=8) -> Seq Scan on _108 (cost=0.00..44.70 rows=1670 width=8) -> Sort (cost=6896.05..6896.05 rows=60421 width=16) -> Nested Loop (cost=0.00..2097.79 rows=60421 width=16) -> Seq Scan on _110 (cost=0.00..1.37 rows=37 width=8) -> Seq Scan on _111 (cost=0.00..40.33 rows=1633 width=8) Attila ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] "AND", "OR" and Materialize :((((
Hi, > > WHY SHALL IT MATERIALIZE A CONSTANT RESULT IN A LOOP OF 12 TIMES ?? > > [it would be enough to materialize only once, > > Which in fact is exactly what the materialize node is for. The reported > costs are pretty bogus, but AFAICT the plan is the right thing. Thanx for the answer. I've thought the same (eg. materialize should reduce the amount of work to be done, but I haven't felt this in the result) Unfortunatelly the cost prediction in line 2 may be close to the real cost. According to some measures: time for the query with 'AND':2 sec time for the query with 'OR': 421 sec So the question is what to do? Can I speed up the second one? [vacuum analyze and indices are done, postgres version is 7.1.2] 1:Aggregate (cost=5017202.06..5017202.06 rows=1 width=24) 2: -> Nested Loop (cost=0.00..5016900.05 rows=120806 width=24) 3:-> Seq Scan on _108 (cost=0.00..44.70 rows=1670 width=8) 4:-> Materialize (cost=2097.79..2097.79 rows=60421 width=16) 5: -> Nested Loop (cost=0.00..2097.79 rows=60421 width=16) 6:-> Seq Scan on _110 (cost=0.00..1.37 rows=37 width=8) 7:-> Seq Scan on _111 (cost=0.00..40.33 rows=1633 width=8) Attila ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])