[SQL] Optimizing Multiply Joins ???

2000-09-13 Thread Meszaros Attila

Hi all,

We are building a sophisticated and flexible database structure and thus,
we have quite complicated and longish queries containing lots of joins.

Using only a few test records in our structure we have performed some
measures, and it is hard to interpret the results.

Until we join no more than 10 tables the response time is below 0.2 s.
joining the 11th table comes with a dramatic change: response time
usually grows up to 5-7 s, 

I'we read the related pages of the documentation, and found the
description of the default and the genetic optimizer too. And also found
the story about the german knowledge-based system project where longer
queries than 10 joins were also too slow.

But I think (hope) we could have a solution, because all of our
complex joins are following foreign keys. 

If we could give some hints to the planner about foreign keys, it
should not generate plenty of unusable plans for the optimizer.


Here I send an example:

the query:
select 
h.literal as division, 
j.literal as soatype, 
e.username, 
e.password,
c.objectid 
from 
o_division as a 
join o_soa as b 
on b.divisionobjectid=a.objectid 
join o_soainstance as c 
on c.soaobjectid=b.objectid 
join o_staff_rdl_soainstance_role_ as d 
on d.soainstanceobjectid=c.objectid 
join o_electronic as e 
on e.pointerobjectid=d.objectid 
join o_soatype as f 
on f.objectid=b.soatypeobjectid 
join o_meaning as g 
on g.objectid=a.name 
join o_meaning_rndl_language_role_ as h 
on h.meaningobjectid=g.objectid 
and h.languageobjectid=11 
join o_meaning as i 
on i.objectid=f.name 
join o_meaning_rndl_language_role_ as j 
on j.meaningobjectid=i.objectid 
and j.languageobjectid=11
join o_staff as k 
on k.objectid=d.staffobjectid
join o_externalcontributor as l 
on l.pointerobjectid=k.objectid


the structure behind it:
[the arrows are representing the foreign keys.]

a -> g <- h
^
|
b -> f -> i <- j
^
|
c
^
|
d -> k <- l
^
|
e

results of this query:
join from a to j takes 0.2 s
  a to k takes 4.8 s
  a to l takes 5.2 s

I have examined the output of explain in all 3 cases, and I have
the feeling that the planner simply forgets the best solutions
in 2nd and 3rd case.

If this is not enough info for the answer I can send the tables, their
contents, the output of the optimizer. or whatever you need for the
answer (including beer :)

Attila




Re: [SQL] Optimizing Multiply Joins ???

2000-09-13 Thread Meszaros Attila

Hi, 

> That's interesting; apparently the regular optimizer is faster than the
> GEQO optimizer for your style of query.  Try increasing the GEQO
> threshold (pg_option "geqo_rels") to more than 11.
Changing this option in a psql session with 'set' has really helped
Thanks.
the results have changed this way:
nr tables plain(s):  more indices(2):  geqo off(s):
10: 0.2 0.3 0.3
11: 4.8 5.9 0.8
12: 5.2 7.0 1.8

But it seems, geqo_rels option is not parsed from the pg_option
file.

> Actually, as the 7.1 code currently stands, a query that uses explicit
> JOIN operators like yours does will always be implemented in exactly
> the given join order, with no searching.  I haven't quite decided if
> that's a bug or a feature ...
Great !!
If possible, it should be left in the codebase at least as an
option later...
Because we could have dozens(hundreds) of preoptimized queries.

Attila





Re: [SQL] Optimizing Multiply Joins ???

2000-09-14 Thread Meszaros Attila

Hi,

> Actually, as the 7.1 code currently stands, a query that uses explicit
> JOIN operators like yours does will always be implemented in exactly
> the given join order, with no searching.  I haven't quite decided if
> that's a bug or a feature ...

Do you mean a "linear binary tree" like this is executed?

 /\
/\ f
   /\ e
  /\ d
 /\ c
a  b

Or can we have some variations on the graph (with the same
preorder run: a,b,c,d,e,f) like this:

/\  
   /  \ 
  /   /\
 /\  d /\   
a /\  e  f  
 b  c   

We could give hints on the joins with brackets this way:
((a,(b,c)),(d,(e,f)))

I'm not sure which version of standards allows to bracket joins,
but I know sybase accepts the above form.

How difficult it looks to hack the parser to accept this form, 
and pass the meaning to the planner?

Attila





Re: [SQL] Optimizing Multiply Joins ???

2000-09-14 Thread Meszaros Attila

Hi,

> If that's what you write, yes.  You can parenthesize the JOIN clauses
> any way you like, though, and the 7.1 planner will follow that structure.

Thanks for the exhaustive answer.

Can I test this feature in the current snapshot?

Attila




[SQL] Subqueries in from clause?

2000-09-25 Thread Meszaros Attila

Hi All,

My question for this week:
How far is the above subject from beeing implemented?

(I've run through the archives, and found some quite old mails in
this topic. They mentioned automatically generated temp tables as possibe
solutions... Is this the plan even today?)

Attila

Ps.:
Two weeks ago I had a question about multiplee joins. According to Tom's
advice I translated the actual snapshot from cvs and after submitting the
query with parenthesis on the right place I experienced a speedup factor
of 200 (two hundred!!!) comapred to the GEQO. Great.

Ps^2.:
Also tested tuple toasting up to 4Mbytes of single tuples...
I like it :))




[SQL] "AND", "OR" and Materialize :((((

2001-08-26 Thread Meszaros Attila

Hi all,

We've got the following 3 tables and 2 simple queries. The only difference
lies in the join condition: the first uses OR, the second uses AND.

I expected some difference in the performace according to the
difference in the evaluation of the logical form, but not
3 magnitudes !!!

So the question is:

WHY SHALL IT MATERIALIZE A CONSTANT RESULT IN A LOOP OF 12 TIMES ??
[it would be enough to materialize only once, or even never, because
the size of the materialized table is not larger than 1Mb... ]

ps.: there are indeces on all referenced fields.


atti=# explain select count(*) from _108 left join (_111 cross join _110) 
on (_108.objectid=_111._108objectid OR _108.objectid=_110._108objectid); 
NOTICE:  QUERY PLAN:

Aggregate  (cost=5017202.06..5017202.06 rows=1 width=24)
  ->  Nested Loop  (cost=0.00..5016900.05 rows=120806 width=24)
->  Seq Scan on _108  (cost=0.00..44.70 rows=1670 width=8)
->  Materialize  (cost=2097.79..2097.79 rows=60421 width=16)
  ->  Nested Loop  (cost=0.00..2097.79 rows=60421 width=16)
->  Seq Scan on _110  (cost=0.00..1.37 rows=37 width=8)
->  Seq Scan on _111  (cost=0.00..40.33 rows=1633 width=8)

EXPLAIN
atti=# explain select count(*) from _108 left join (_111 cross join _110) 
on _108.objectid=_111._108objectid AND _108.objectid=_110._108objectid; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=7965.68..7965.68 rows=1 width=24)
  ->  Merge Join  (cost=7030.14..7961.51 rows=1670 width=24)
->  Sort  (cost=134.09..134.09 rows=1670 width=8)
  ->  Seq Scan on _108  (cost=0.00..44.70 rows=1670 width=8)
->  Sort  (cost=6896.05..6896.05 rows=60421 width=16)
  ->  Nested Loop  (cost=0.00..2097.79 rows=60421 width=16)
->  Seq Scan on _110  (cost=0.00..1.37 rows=37 width=8)
->  Seq Scan on _111  (cost=0.00..40.33 rows=1633 width=8)

Attila



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [SQL] "AND", "OR" and Materialize :((((

2001-08-26 Thread Meszaros Attila

Hi,

> > WHY SHALL IT MATERIALIZE A CONSTANT RESULT IN A LOOP OF 12 TIMES ??
> > [it would be enough to materialize only once,
> 
> Which in fact is exactly what the materialize node is for.  The reported
> costs are pretty bogus, but AFAICT the plan is the right thing.

Thanx for the answer.
I've thought the same (eg. materialize should reduce the amount
of work to be done, but I haven't felt this in the result)

Unfortunatelly the cost prediction in line 2 may be close to
the real cost. According to some measures:
time for the query with 'AND':2 sec
time for the query with 'OR':   421 sec

So the question is what to do? 
Can I speed up the second one?

[vacuum analyze and indices are done, postgres version is 7.1.2]

1:Aggregate  (cost=5017202.06..5017202.06 rows=1 width=24)
2:  ->  Nested Loop  (cost=0.00..5016900.05 rows=120806 width=24)
3:->  Seq Scan on _108  (cost=0.00..44.70 rows=1670 width=8)
4:->  Materialize  (cost=2097.79..2097.79 rows=60421 width=16)
5:  ->  Nested Loop  (cost=0.00..2097.79 rows=60421 width=16)
6:->  Seq Scan on _110  (cost=0.00..1.37 rows=37 width=8)
7:->  Seq Scan on _111  (cost=0.00..40.33 rows=1633 width=8)

Attila


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])