Re: [HACKERS] Re: Any optimizations to the join code in 7.1?
Thomas Lockhart [EMAIL PROTECTED] writes: But it is possible, under many circumstances, for query optimization to be a benefit for a many-table query. The docs indicate that explicit join syntax bypasses that, even for inner joins, so you may find that this syntax is a net loss in performance depending on the query and your choice of table order. Presumably we will be interested in making these two forms of inner join equivalent in behavior in a future release. Tom, what are the impediments we might encounter in doing this? I don't think there are any real technical problems in the way; it's simply an implementation choice not to treat INNER JOIN the same as an implicit join list. I did it that way in 7.1 mainly as a flyer, to see how many people would think it's a feature vs. how many think it's a bug. The votes aren't all in yet, but here we have Mike apparently pretty pleased with it, while I recall at least one other person who was not happy with the 7.1 behavior. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Re: Any optimizations to the join code in 7.1?
... 7.1 out of the box took only 2 seconds! I was amazed and shocked at this damned impressive improvement in planning speeduntil I actually used the explicit JOIN syntax described in 11.2. Instanteous results! Instantaneous. But it is possible, under many circumstances, for query optimization to be a benefit for a many-table query. The docs indicate that explicit join syntax bypasses that, even for inner joins, so you may find that this syntax is a net loss in performance depending on the query and your choice of table order. Presumably we will be interested in making these two forms of inner join equivalent in behavior in a future release. Tom, what are the impediments we might encounter in doing this? - Thomas ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
RE: [HACKERS] Re: Any optimizations to the join code in 7.1?
What would be nice, and I don't know how it would be done or what the syntax would be, would be a feature that allows PostgreSQL to skip not only the parsing stage, but the planning stage as well. Then, when the data has changed dramatically enough to warrant it, as you point out, a command can be issued to 'refresh' the query plan. My 15-way join has expanded to a 19-way join and is still instantaneous, albeit on a very small set of data. Before 7.1, the query would simply have taken far too long, and I would have had to denormalize the database for performance purposes. With the explicit join syntax, it allows me to design the database 'the right way'. I basically used EXPLAIN SELECT... to determine the explicit join order, so as the data changes, its something I'll have to do on occassion to ensure good performance, but at least its now possible. :-) Mike Mascari [EMAIL PROTECTED] -Original Message- From: Thomas Lockhart [SMTP:[EMAIL PROTECTED]] Sent: Friday, April 27, 2001 9:49 PM To: [EMAIL PROTECTED]; 'Tom Lane' Cc: '[EMAIL PROTECTED]' Subject:[HACKERS] Re: Any optimizations to the join code in 7.1? ... 7.1 out of the box took only 2 seconds! I was amazed and shocked at this damned impressive improvement in planning speeduntil I actually used the explicit JOIN syntax described in 11.2. Instanteous results! Instantaneous. But it is possible, under many circumstances, for query optimization to be a benefit for a many-table query. The docs indicate that explicit join syntax bypasses that, even for inner joins, so you may find that this syntax is a net loss in performance depending on the query and your choice of table order. Presumably we will be interested in making these two forms of inner join equivalent in behavior in a future release. Tom, what are the impediments we might encounter in doing this? - Thomas ---(end of broadcast)- -- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] Re: Any optimizations to the join code in 7.1?
On Wed, 25 Apr 2001, Tom Lane wrote: Mike Mascari [EMAIL PROTECTED] writes: I have a particular query which performs a 15-way join; You should read http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html I was recently poring over this page myself, as I've been working w/some larger-than-usual queries. Two questions: 1) it appears (from my tests) that SELECT * FROM CREATE VIEW joined as SELECT p.id, p.pname, c.cname FROM p LEFT OUTER JOIN c using (id) gives the same answer as SELECT * FROM CREATE VIEW nested SELECT p.id, p.pname, (select c.cname from c where c.id = p.id) FROM p However, I often am writing VIEWs that will be used by developers in a front-end system. Usually, this view might have 30 items in the select clause, but the developer using it is likely to only as for four or five items. In this case, I often prefer the subquery form because it appears that SELECT id, pname FROM joined is more complicated than SELECT id, pname FROM nested as the first has to perform the join, and the second doesn't. Is this actually correct? 2) The explicit-joins help suggests that manual structuring and experimentation might help -- has anyone written (or could anyone write) anthing about where to start in guessing what join order might be optimal? -- Joel Burton [EMAIL PROTECTED] Director of Information Systems, Support Center of Washington ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Re: Any optimizations to the join code in 7.1?
Joel Burton [EMAIL PROTECTED] writes: 1) it appears (from my tests) that SELECT * FROM CREATE VIEW joined as SELECT p.id, p.pname, c.cname FROM p LEFT OUTER JOIN c using (id) gives the same answer as SELECT * FROM CREATE VIEW nested SELECT p.id, p.pname, (select c.cname from c where c.id = p.id) FROM p Only if c.id is a unique column (ie, there are always 0 or 1 matches in c for any given p.id). Otherwise the subselect form will fail. However, I often am writing VIEWs that will be used by developers in a front-end system. Usually, this view might have 30 items in the select clause, but the developer using it is likely to only as for four or five items. In this case, I often prefer the subquery form because it appears that SELECT id, pname FROM joined is more complicated than SELECT id, pname FROM nested as the first has to perform the join, and the second doesn't. Is this actually correct? This approach is probably reasonable if the cname field of the view result is seldom wanted at all, and never used as a WHERE constraint. You'd get a very nonoptimal plan if someone did select * from nested where cname like 'foo%' since the planner has no way to use the LIKE constraint to limit the rows fetched from p. In the JOIN format, on the other hand, I think the constraint could be exploited. Also bear in mind that the subselect form is essentially forcing the join to be done via a nested loop. If you have an index on c.id then this may not be too bad, but without one the performance will be horrid. Even with an index, nested loop with inner indexscan is not the join method of choice if you are retrieving a lot of rows. 2) The explicit-joins help suggests that manual structuring and experimentation might help -- has anyone written (or could anyone write) anthing about where to start in guessing what join order might be optimal? The obvious starting point is the plan produced by the planner from an unconstrained query. Even if you don't feel like trying to improve it, you could cut the time to reproduce the plan quite a bit --- just CROSS JOIN a few of the relation pairs that are joined first in the unconstrained plan. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Re: Any optimizations to the join code in 7.1?
On Wed, 25 Apr 2001, Tom Lane wrote: 2) The explicit-joins help suggests that manual structuring and experimentation might help -- has anyone written (or could anyone write) anthing about where to start in guessing what join order might be optimal? The obvious starting point is the plan produced by the planner from an unconstrained query. Even if you don't feel like trying to improve it, you could cut the time to reproduce the plan quite a bit --- just CROSS JOIN a few of the relation pairs that are joined first in the unconstrained plan. In other words, let it do the work, and steal the credit for ourselves. :-) Thanks, Tom. I appreciate your answers to my questions. In other DB systems I've used, some find that for this original query: SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo'; that this version SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo'; has slower performance than SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo'; because it can reduce b before any join. Is it safe to assume that this is a valid optimization in PostgreSQL? If this whole thing were a view, except w/o the WHERE clause, and we were querying the view w/the b.name WHERE clause, would we still see a performance boost from the right arrangement? (ie, does our criteria get pushed down early enough in the joining process?) TIA, -- Joel Burton [EMAIL PROTECTED] Director of Information Systems, Support Center of Washington ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Re: Any optimizations to the join code in 7.1?
Joel Burton [EMAIL PROTECTED] writes: In other DB systems I've used, some find that for this original query: SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo'; that this version SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo'; has slower performance than SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo'; because it can reduce b before any join. Is it safe to assume that this is a valid optimization in PostgreSQL? In general, that'd be a waste of time --- our planner considers the same set of plans in either case. However, it could make a difference if the planner thinks that the two choices (a outer or b outer) have exactly the same cost. In that case the order you wrote them in will influence which plan actually gets picked; and if the planner's estimate is wrong --- ie, there really is a considerable difference in the costs --- then you could see a change in performance depending on which way you wrote it. That's a pretty unusual circumstance, maybe, but it just happens that I'm in the middle of looking at a planning bug wherein exactly this behavior occurs... If this whole thing were a view, except w/o the WHERE clause, and we were querying the view w/the b.name WHERE clause, would we still see a performance boost from the right arrangement? (ie, does our criteria get pushed down early enough in the joining process?) Shouldn't make a difference; AFAIK the WHERE clause will get pushed down as far as possible, independently of whether a view is involved or you wrote it out the hard way. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html