Re: [HACKERS] Eliminating unnecessary left joins
But then what about the null values? Perhaps unique + notnull is better? Otto 2007/4/20, Nicolas Barbier [EMAIL PROTECTED]: 2007/4/16, Ottó Havasvölgyi [EMAIL PROTECTED]: Eliminate the table T from the query/subquery if the following requirements are satisfied: 1. T is left joined 2. T is referenced only in the join expression where it is left joined 3. the left join's join expression is a simple equality expression like T1.C1=T2.C2; T1!=T2 and (T==T1 or T==T2) 4. the column of T in the join exression is the primary key of T Condition 4 should be: the column of T in the join expression is a key of T (i.e. it doesn't need to be the PK, a UNIQUE constraint would be enough). This process can be done recursively (implementation doesn't have to be recursive, of course), to eliminate whole sub-trees of the join tree. Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html
Re: [HACKERS] Eliminating unnecessary left joins
Hi, Could you Bruce please add a TODO item for this feature? The description could look something like this: Eliminate the table T from the query/subquery if the following requirements are satisfied: 1. T is left joined 2. T is referenced only in the join expression where it is left joined 3. the left join's join expression is a simple equality expression like T1.C1=T2.C2; T1!=T2 and (T==T1 or T==T2) 4. the column of T in the join exression is the primary key of T I hope it is comlete. I think this is the simplest case, so we should start with this. Thanks, Otto
Re: [HACKERS] Eliminating unnecessary left joins
Jim, Maybe odd, but simpler to optimize this way. Your idea would be also a very good optimization, there was already a discussion about that here: http://archives.postgresql.org/pgsql-performance/2006-01/msg00151.php, but that time Tom refused it because it was too expensive and rare. Maybe now he has a different opinion. However, left join optimization is lot simpler and cheaper, and can be useful not only for O/R mappers, but for efficient vertical partitioning as Simon mentioned. Best regards, Otto 2007/4/12, Jim Nasby [EMAIL PROTECTED]: I agree with others that the way that query is constructed is a bit odd, but it does bring another optimization to mind: when doing an inner-join between a parent and child table when RI is defined between them, if the query only refers to the child table you can drop the parent table from the join, because each row in the child table must have one and only one row in the parent. Use-case: I'll often use views to make it easier to query several related tables, but not all queries against the views need to hit every table. IE: if a table has several status fields that have RI to parent tables that describe what each status is, you sometimes will query for the status description, sometimes not. I suspect that checking to see if tables have the right unique keys or RI would add a noticeable amount of extra work to query planning, so we might want a GUC to disable it. On Apr 7, 2007, at 12:45 PM, Ottó Havasvölgyi wrote: Sorry, I have left out the PK requirement. What Nicolas wrote is right, I also use an O/R mapper and inheritance is solved with vertical partitioning. The tables are connected to each other with the PK. And the mapper defines views for each class with left joins. The mapper generates queries based on these views. A high fraction of the joins would be eliminated almost in every query. My simple example: Class hierarchy and fields: Shape (ID, X, Y) | +-Circle (ID, Radius) | +-Rectangle (ID, Width, Height) The mapper creates 3 tables with the columns next to the class name. And it creates 3 views. One of them: RectangleView: SELECT r.ID as ID, s.X as X, s.Y as Y, r.Width as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape s ON ( r.ID=s.ID) Now if I query Rectangle object IDs, whose Width is greater than 5, it will generate this: SELECT ID FROM RectangleView WHERE Width5 In this case I don't need to left join the Shape table, because X and Y columns are not used. The other typical situation is when I execute more complex, not-O/ Rmapper-generated SQL commands based on these views for reporting. For example the average width of rectangles whose height is greater than 10. This optimization should be also applied to subqueries. Is this optimization relatively easy to introduce? I would gladly work on this, but unfortunately I don't know the codebase at all. I would really appreciate if someone competent implemented this feature in 8.4. Thank you in advance, Otto -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Re: [HACKERS] Eliminating unnecessary left joins
My mapper joins the parent classes' tables to the current class' table in the view. In the ShapeView only ID, X, and Y is selected from the shape table, and none of the child tables are touched, opposite to your sample. But even though all Shape objects (circles and rectangles too) are in the resultset as Shape objects. I see this storage model quite consistent. You are right, that this can be done with inner join too, this is an option in the mapper. Oracle and MSSQL performs this left join optimization, so it is usually used with left join by other mapper users. I have asked them (the developers of the mapper) to perform this optimization at mapper level because not all DBMSs supported this optimization, but it seemed they didn't like this idea... And then I came here. This optimization would be useful for every Postgres users. To be honest I also find your sample strange, more exactly that *sibling* child tables are left joined to the parent. Maybe because the storage model is different than in my mapper. In my case the left joined parent tables should be excluded by the optimizer if possible. Best regards, Otto 2007/4/8, Nicolas Barbier [EMAIL PROTECTED]: 2007/4/7, Ottó Havasvölgyi [EMAIL PROTECTED]: My simple example: Class hierarchy and fields: Shape (ID, X, Y) | +-Circle (ID, Radius) | +-Rectangle (ID, Width, Height) The mapper creates 3 tables with the columns next to the class name. And it creates 3 views. One of them: RectangleView: SELECT r.ID as ID, s.X as X, s.Y as Y, r.Width as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape s ON ( r.ID=s.ID) I find this view definition a bit strange: why is there a left outer join? I expect there to be a FK from Rectangle.ID to Shape.ID (all rectangles are shapes), which makes the definition totally equivalent with one in which a normal join is used (whether attributes of Shape are used or not). The main use case I see for the original optimization is ORMs that join in a whole hierarchy, even when only a part of it is needed. I guess that that is rather common. The ORM that I use does exactly this, because the main target-DBMSs (MS-SQL and Oracle) do the optimization for it. Example (somewhat less contrived than my previous one): Imagine an implementation of the typical books that are borrowed by people n-m relationship, using three tables (Book, Borrowed, Person). Let's find all books that have been borrowed by a certain person. The non-ORM version would be something like: SELECT Book.* FROM Book JOIN Borrowed ON Borrowed.book_id = Book.id http://book.id/ WHERE Borrowed.person_id = x; Now assume that Borrowed is a class hierarchy mapped into multiple tables by a typical ORM. The query would probably become something like: SELECT Book.* FROM Book JOIN Borrowed_Parent ON Borrowed_Parent.book_id = Book.idhttp://book.id/ LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id (...) WHERE Borrowed_Parent.person_id = x; It is clear that the children of the hierarchy are needlessly joined in (as the only attribute that is actually needed is person_id, which is on the parent level). It is not always trivial for the ORM to find that out, without writing stuff that looks suspiciously similar to a DBMS optimizer. Maybe it is debatable whether this optimization should be done by the application (i.e. the ORM) or by the DBMS. I am personally in favor of doing it in the DBMS. greetings, Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html
Re: [HACKERS] Eliminating unnecessary left joins
Sorry, I have left out the PK requirement. What Nicolas wrote is right, I also use an O/R mapper and inheritance is solved with vertical partitioning. The tables are connected to each other with the PK. And the mapper defines views for each class with left joins. The mapper generates queries based on these views. A high fraction of the joins would be eliminated almost in every query. My simple example: Class hierarchy and fields: Shape (ID, X, Y) | +-Circle (ID, Radius) | +-Rectangle (ID, Width, Height) The mapper creates 3 tables with the columns next to the class name. And it creates 3 views. One of them: RectangleView: SELECT r.ID as ID, s.X as X, s.Y as Y, r.Width as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape s ON ( r.ID=s.ID) Now if I query Rectangle object IDs, whose Width is greater than 5, it will generate this: SELECT ID FROM RectangleView WHERE Width5 In this case I don't need to left join the Shape table, because X and Y columns are not used. The other typical situation is when I execute more complex, not-O/Rmapper-generated SQL commands based on these views for reporting. For example the average width of rectangles whose height is greater than 10. This optimization should be also applied to subqueries. Is this optimization relatively easy to introduce? I would gladly work on this, but unfortunately I don't know the codebase at all. I would really appreciate if someone competent implemented this feature in 8.4. Thank you in advance, Otto
[HACKERS] Eliminating unnecessary left joins
Hi, When using views built with left joins, and then querying against these views, there are a lot of join in the plan that are not necessary, because I don't select/use any column of each table in the views every time. Tables that are left joined and never referenced anywhere else in the query should be removed from the plan. I think this can be done without any other analyzation or catalog lookup, so it is a quite cheap optimization step, and doing it won't influence the result, but the query will run faster. This way with a complex query against these views usually the half of the join can be eliminated, and the plan will be quite more optimal. Why left join a table if never used/referenced in the query? How easy is to teach Postgres to this? I would like to help somehow to introduce this feature as soon as possible. What should I do? Thanks, Otto