Re: [HACKERS] Re: Any optimizations to the join code in 7.1?

2001-04-30 Thread Tom Lane

Thomas Lockhart [EMAIL PROTECTED] writes:
 But it is possible, under many circumstances, for query optimization to
 be a benefit for a many-table query. The docs indicate that explicit
 join syntax bypasses that, even for inner joins, so you may find that
 this syntax is a net loss in performance depending on the query and your
 choice of table order.

 Presumably we will be interested in making these two forms of inner join
 equivalent in behavior in a future release. Tom, what are the
 impediments we might encounter in doing this?

I don't think there are any real technical problems in the way; it's
simply an implementation choice not to treat INNER JOIN the same as an
implicit join list.  I did it that way in 7.1 mainly as a flyer, to see
how many people would think it's a feature vs. how many think it's a
bug.  The votes aren't all in yet, but here we have Mike apparently
pretty pleased with it, while I recall at least one other person who
was not happy with the 7.1 behavior.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] Re: Any optimizations to the join code in 7.1?

2001-04-27 Thread Thomas Lockhart

 ... 7.1 out of the box took only 2 seconds! I was amazed
 and shocked at this damned impressive improvement in planning
 speeduntil I actually used the explicit JOIN syntax described in
 11.2. Instanteous results! Instantaneous.

But it is possible, under many circumstances, for query optimization to
be a benefit for a many-table query. The docs indicate that explicit
join syntax bypasses that, even for inner joins, so you may find that
this syntax is a net loss in performance depending on the query and your
choice of table order.

Presumably we will be interested in making these two forms of inner join
equivalent in behavior in a future release. Tom, what are the
impediments we might encounter in doing this?

  - Thomas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



RE: [HACKERS] Re: Any optimizations to the join code in 7.1?

2001-04-27 Thread Mike Mascari

What would be nice, and I don't know how it would be done or what the 
syntax would be, would be a feature that allows PostgreSQL to skip 
not only the parsing stage, but the planning stage as well. Then, 
when the data has changed dramatically enough to warrant it, as you 
point out, a command can be issued to 'refresh' the query plan. My 
15-way join has expanded to a 19-way join and is still instantaneous, 
albeit on a very small set of data. Before 7.1, the query would 
simply have taken far too long, and I would have had to denormalize 
the database for performance purposes. With the explicit join syntax, 
it allows me to design the database 'the right way'. I basically used 
EXPLAIN SELECT... to determine the explicit join order, so as the 
data changes, its something I'll have to do on occassion to ensure 
good performance, but at least its now possible. :-)

Mike Mascari
[EMAIL PROTECTED]

-Original Message-
From:   Thomas Lockhart [SMTP:[EMAIL PROTECTED]]
Sent:   Friday, April 27, 2001 9:49 PM
To: [EMAIL PROTECTED]; 'Tom Lane'
Cc: '[EMAIL PROTECTED]'
Subject:[HACKERS] Re: Any optimizations to the join code in 7.1?

 ... 7.1 out of the box took only 2 seconds! I was amazed
 and shocked at this damned impressive improvement in planning
 speeduntil I actually used the explicit JOIN syntax described 
in
 11.2. Instanteous results! Instantaneous.

But it is possible, under many circumstances, for query optimization 
to
be a benefit for a many-table query. The docs indicate that explicit
join syntax bypasses that, even for inner joins, so you may find that
this syntax is a net loss in performance depending on the query and 
your
choice of table order.

Presumably we will be interested in making these two forms of inner 
join
equivalent in behavior in a future release. Tom, what are the
impediments we might encounter in doing this?

  - Thomas

---(end of broadcast)-  
--
TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[HACKERS] Re: Any optimizations to the join code in 7.1?

2001-04-25 Thread Joel Burton

On Wed, 25 Apr 2001, Tom Lane wrote:

 Mike Mascari [EMAIL PROTECTED] writes:
  I have a particular query which performs a 15-way join;
 
 You should read 
 http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html

I was recently poring over this page myself, as I've been working w/some
larger-than-usual queries.

Two questions:

1) it appears (from my tests) that SELECT * FROM

   CREATE VIEW joined as
   SELECT p.id,
  p.pname,
  c.cname
   FROM   p
   LEFT OUTER JOIN c using (id)

   gives the same answer as SELECT * FROM

   CREATE VIEW nested
   SELECT p.id,
  p.pname,
  (select c.cname from c where c.id = p.id)
   FROM   p

   However, I often am writing VIEWs that will be used by developers
   in  a front-end system. Usually, this view might have 30 items in the
   select clause, but the developer using it is likely to only as for
   four or five items. In this case, I often prefer the
   subquery form because it appears that

   SELECT id, pname FROM joined

   is more complicated than

   SELECT id, pname FROM nested

   as the first has to perform the join, and the second doesn't.

   Is this actually correct?

2) The explicit-joins help suggests that manual structuring and
   experimentation might help -- has anyone written (or could
   anyone write) anthing about where to start in guessing what
   join order might be optimal?


-- 
Joel Burton   [EMAIL PROTECTED]
Director of Information Systems, Support Center of Washington


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Re: Any optimizations to the join code in 7.1?

2001-04-25 Thread Tom Lane

Joel Burton [EMAIL PROTECTED] writes:
 1) it appears (from my tests) that SELECT * FROM

CREATE VIEW joined as
SELECT p.id,
   p.pname,
   c.cname
FROM   p
LEFT OUTER JOIN c using (id)

gives the same answer as SELECT * FROM

CREATE VIEW nested
SELECT p.id,
   p.pname,
   (select c.cname from c where c.id = p.id)
FROM   p

Only if c.id is a unique column (ie, there are always 0 or 1 matches in
c for any given p.id).  Otherwise the subselect form will fail.

However, I often am writing VIEWs that will be used by developers
in  a front-end system. Usually, this view might have 30 items in the
select clause, but the developer using it is likely to only as for
four or five items. In this case, I often prefer the
subquery form because it appears that
SELECT id, pname FROM joined
is more complicated than
SELECT id, pname FROM nested
as the first has to perform the join, and the second doesn't.

Is this actually correct?

This approach is probably reasonable if the cname field of the view
result is seldom wanted at all, and never used as a WHERE constraint.
You'd get a very nonoptimal plan if someone did

select * from nested where cname like 'foo%'

since the planner has no way to use the LIKE constraint to limit the
rows fetched from p.  In the JOIN format, on the other hand, I think
the constraint could be exploited.

Also bear in mind that the subselect form is essentially forcing the
join to be done via a nested loop.  If you have an index on c.id then
this may not be too bad, but without one the performance will be
horrid.  Even with an index, nested loop with inner indexscan is not
the join method of choice if you are retrieving a lot of rows.

 2) The explicit-joins help suggests that manual structuring and
experimentation might help -- has anyone written (or could
anyone write) anthing about where to start in guessing what
join order might be optimal?

The obvious starting point is the plan produced by the planner from an
unconstrained query.  Even if you don't feel like trying to improve it,
you could cut the time to reproduce the plan quite a bit --- just CROSS
JOIN a few of the relation pairs that are joined first in the
unconstrained plan.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Re: Any optimizations to the join code in 7.1?

2001-04-25 Thread Joel Burton

On Wed, 25 Apr 2001, Tom Lane wrote:

  2) The explicit-joins help suggests that manual structuring and
 experimentation might help -- has anyone written (or could
 anyone write) anthing about where to start in guessing what
 join order might be optimal?
 
 The obvious starting point is the plan produced by the planner from an
 unconstrained query.  Even if you don't feel like trying to improve it,
 you could cut the time to reproduce the plan quite a bit --- just CROSS
 JOIN a few of the relation pairs that are joined first in the
 unconstrained plan.

In other words, let it do the work, and steal the credit for
ourselves. :-)

Thanks, Tom. I appreciate your answers to my questions.



In other DB systems I've used, some find that for this original query:

  SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo';

that this version 

  SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo';

has slower performance than
 
  SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo';

because it can reduce b before any join. 

Is it safe to assume that this is a valid optimization in PostgreSQL?


If this whole thing were a view, except w/o the WHERE clause, and we were
querying the view w/the b.name WHERE clause, would we still see a
performance boost from the right arrangement? (ie, does our criteria get
pushed down early enough in the joining process?)


TIA,
-- 
Joel Burton   [EMAIL PROTECTED]
Director of Information Systems, Support Center of Washington


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Re: Any optimizations to the join code in 7.1?

2001-04-25 Thread Tom Lane

Joel Burton [EMAIL PROTECTED] writes:
 In other DB systems I've used, some find that for this original query:
   SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo';
 that this version 
   SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo';
 has slower performance than
   SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo';
 because it can reduce b before any join. 

 Is it safe to assume that this is a valid optimization in PostgreSQL?

In general, that'd be a waste of time --- our planner considers the same
set of plans in either case.

However, it could make a difference if the planner thinks that the two
choices (a outer or b outer) have exactly the same cost.  In that case
the order you wrote them in will influence which plan actually gets
picked; and if the planner's estimate is wrong --- ie, there really is a
considerable difference in the costs --- then you could see a change in
performance depending on which way you wrote it.  That's a pretty
unusual circumstance, maybe, but it just happens that I'm in the middle
of looking at a planning bug wherein exactly this behavior occurs...

 If this whole thing were a view, except w/o the WHERE clause, and we were
 querying the view w/the b.name WHERE clause, would we still see a
 performance boost from the right arrangement? (ie, does our criteria get
 pushed down early enough in the joining process?)

Shouldn't make a difference; AFAIK the WHERE clause will get pushed down
as far as possible, independently of whether a view is involved or you
wrote it out the hard way.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html