Re: [HACKERS] Eliminating unnecessary left joins

2007-04-21 Thread Ottó Havasvölgyi

But then what about the null values? Perhaps unique + notnull is better?

Otto


2007/4/20, Nicolas Barbier [EMAIL PROTECTED]:


2007/4/16, Ottó Havasvölgyi [EMAIL PROTECTED]:

 Eliminate the table T from the query/subquery if the following
requirements
 are satisfied:
 1. T is left joined
 2. T is referenced only in the join expression where it is left joined
 3. the left join's join expression is a simple equality expression like
 T1.C1=T2.C2; T1!=T2 and (T==T1 or T==T2)
 4. the column of T in the join exression is the primary key of T

Condition 4 should be: the column of T in the join expression is a key
of T (i.e. it doesn't need to be the PK, a UNIQUE constraint would be
enough).

This process can be done recursively (implementation doesn't have to
be recursive, of course), to eliminate whole sub-trees of the join
tree.

Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html



Re: [HACKERS] Eliminating unnecessary left joins

2007-04-16 Thread Ottó Havasvölgyi

Hi,

Could you Bruce please add a TODO item for this feature?
The description could look something like this:

Eliminate the table T from the query/subquery if the following requirements
are satisfied:
1. T is left joined
2. T is referenced only in the join expression where it is left joined
3. the left join's join expression is a simple equality expression like
T1.C1=T2.C2; T1!=T2 and (T==T1 or T==T2)
4. the column of T in the join exression is the primary key of T



I hope it is comlete.
I think this is the simplest case, so we should start with this.

Thanks,
Otto


Re: [HACKERS] Eliminating unnecessary left joins

2007-04-12 Thread Ottó Havasvölgyi

Jim,

Maybe odd, but simpler to optimize this way.

Your idea would be also a very good optimization, there was already a
discussion about that here:
http://archives.postgresql.org/pgsql-performance/2006-01/msg00151.php, but
that time Tom refused it because it was too expensive and rare. Maybe now he
has a different opinion.
However, left join optimization is lot simpler and cheaper, and can
be useful not only for O/R mappers, but for efficient vertical partitioning
as Simon mentioned.

Best regards,
Otto


2007/4/12, Jim Nasby [EMAIL PROTECTED]:


I agree with others that the way that query is constructed is a bit
odd, but it does bring another optimization to mind: when doing an
inner-join between a parent and child table when RI is defined
between them, if the query only refers to the child table you can
drop the parent table from the join, because each row in the child
table must have one and only one row in the parent.

Use-case: I'll often use views to make it easier to query several
related tables, but not all queries against the views need to hit
every table. IE: if a table has several status fields that have RI to
parent tables that describe what each status is, you sometimes will
query for the status description, sometimes not.

I suspect that checking to see if tables have the right unique keys
or RI would add a noticeable amount of extra work to query planning,
so we might want a GUC to disable it.

On Apr 7, 2007, at 12:45 PM, Ottó Havasvölgyi wrote:

 Sorry, I have left out the PK requirement.
 What Nicolas wrote is right, I also use an O/R mapper and
 inheritance is solved with vertical partitioning. The tables are
 connected to each other with the PK. And the mapper defines views
 for each class with left joins. The mapper generates queries based
 on these views. A high fraction of the joins would be eliminated
 almost in every query.

 My simple example:

 Class hierarchy and fields:
 Shape (ID, X, Y)
 |
 +-Circle (ID, Radius)
 |
 +-Rectangle (ID, Width, Height)

 The mapper creates 3 tables with the columns next to the class name.
 And it creates 3 views. One of them:

 RectangleView:  SELECT r.ID as ID, s.X as X, s.Y as Y,
 r.Width as Width, r.Height as Height FROM Rectangle r
 LEFT JOIN Shape s ON ( r.ID=s.ID)

 Now if I query Rectangle object IDs, whose Width is greater than 5,
 it will generate this:

 SELECT ID FROM RectangleView WHERE Width5

 In this case I don't need to left join the Shape table, because X
 and Y columns are not used.


 The other typical situation is when I execute more complex, not-O/
 Rmapper-generated SQL commands based on these views for reporting.
 For example the average width of rectangles whose height is greater
 than 10.
 

 This optimization should be also applied to subqueries.



 Is this optimization relatively easy to introduce?

 I would gladly work on this, but unfortunately I don't know the
 codebase at all.
 I would really appreciate if someone competent implemented this
 feature in 8.4.

 Thank you in advance,
 Otto


--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)





Re: [HACKERS] Eliminating unnecessary left joins

2007-04-08 Thread Ottó Havasvölgyi

My mapper joins the parent classes' tables to the current class' table in
the view. In the ShapeView only ID, X, and Y is selected from the shape
table, and none of the child tables are touched, opposite to your sample.
But even though all Shape objects (circles and rectangles too) are in the
resultset as Shape objects. I see this storage model quite consistent.
You are right, that this can be done with inner join too, this is an option
in the mapper. Oracle and MSSQL performs this left join optimization, so it
is usually used with left join by other mapper users. I have asked them (the
developers of the mapper) to perform this optimization at mapper level
because not all DBMSs supported this optimization, but it seemed they didn't
like this idea... And then I came here. This optimization would be useful
for every Postgres users.

To be honest I also find your sample strange, more exactly that
*sibling* child tables are left joined to the parent. Maybe because the
storage model is different than in my mapper.

In my case the left joined parent tables should be excluded by the optimizer
if possible.

Best regards,
Otto



2007/4/8, Nicolas Barbier [EMAIL PROTECTED]:


2007/4/7, Ottó Havasvölgyi [EMAIL PROTECTED]:

 My simple example:

 Class hierarchy and fields:
 Shape (ID, X, Y)
 |
 +-Circle (ID, Radius)
 |
 +-Rectangle (ID, Width, Height)

 The mapper creates 3 tables with the columns next to the class name.
 And it creates 3 views. One of them:

 RectangleView:  SELECT r.ID as ID, s.X as X, s.Y as Y,
r.Width
 as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape
s ON
 ( r.ID=s.ID)

I find this view definition a bit strange: why is there a left outer
join? I expect there to be a FK from Rectangle.ID to Shape.ID (all
rectangles are shapes), which makes the definition totally equivalent
with one in which a normal join is used (whether attributes of Shape
are used or not).

The main use case I see for the original optimization is ORMs that
join in a whole hierarchy, even when only a part of it is needed. I
guess that that is rather common. The ORM that I use does exactly
this, because the main target-DBMSs (MS-SQL and Oracle) do the
optimization for it.

Example (somewhat less contrived than my previous one):

Imagine an implementation of the typical books that are borrowed by
people n-m relationship, using three tables (Book, Borrowed,
Person). Let's find all books that have been borrowed by a certain
person.

The non-ORM version would be something like:

SELECT Book.*
FROM
Book
   JOIN Borrowed ON Borrowed.book_id = Book.id http://book.id/
WHERE Borrowed.person_id = x;

Now assume that Borrowed is a class hierarchy mapped into multiple
tables by a typical ORM. The query would probably become something
like:

SELECT Book.*
FROM
 Book
JOIN Borrowed_Parent ON Borrowed_Parent.book_id = 
Book.idhttp://book.id/
   LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id
   LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id
   (...)
WHERE Borrowed_Parent.person_id = x;

It is clear that the children of the hierarchy are needlessly joined
in (as the only attribute that is actually needed is person_id, which
is on the parent level). It is not always trivial for the ORM to find
that out, without writing stuff that looks suspiciously similar to a
DBMS optimizer.

Maybe it is debatable whether this optimization should be done by the
application (i.e. the ORM) or by the DBMS. I am personally in favor of
doing it in the DBMS.

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html



Re: [HACKERS] Eliminating unnecessary left joins

2007-04-07 Thread Ottó Havasvölgyi

Sorry, I have left out the PK requirement.
What Nicolas wrote is right, I also use an O/R mapper and inheritance is
solved with vertical partitioning. The tables are connected to each other
with the PK. And the mapper defines views for each class with left joins.
The mapper generates queries based on these views. A high fraction of
the joins would be eliminated almost in every query.

My simple example:

Class hierarchy and fields:
Shape (ID, X, Y)
|
+-Circle (ID, Radius)
|
+-Rectangle (ID, Width, Height)

The mapper creates 3 tables with the columns next to the class name.
And it creates 3 views. One of them:

RectangleView:  SELECT r.ID as ID, s.X as X, s.Y as Y, r.Width
as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape s ON
( r.ID=s.ID)

Now if I query Rectangle object IDs, whose Width is greater than 5, it will
generate this:

SELECT ID FROM RectangleView WHERE Width5

In this case I don't need to left join the Shape table, because X and Y
columns are not used.


The other typical situation is when I execute more complex,
not-O/Rmapper-generated SQL commands based on these views for reporting. For
example the average width of rectangles whose height is greater than 10.


This optimization should be also applied to subqueries.



Is this optimization relatively easy to introduce?

I would gladly work on this, but unfortunately I don't know the codebase at
all.
I would really appreciate if someone competent implemented this feature in
8.4.

Thank you in advance,
Otto


[HACKERS] Eliminating unnecessary left joins

2007-04-06 Thread Ottó Havasvölgyi

Hi,

When using views built with left joins, and then querying against these
views, there are a lot of join in the plan that are not necessary, because I
don't select/use any column of each table in the views every time. Tables
that are left joined and never referenced anywhere else in the query  should
be removed from the plan. I think this can be done without any other
analyzation or catalog lookup, so it is a quite cheap optimization step, and
doing it won't influence the result, but the query will run faster.
This way with a complex query against these views usually the half of the
join can be eliminated, and the plan will be quite more optimal.
Why left join a table if never used/referenced in the query?

How easy is to teach Postgres to this?

I would like to help somehow to introduce this feature as soon as possible.
What should I do?

Thanks,
Otto