Re: [SQL] full join in view

Tambet Matiisen Tue, 14 Jan 2003 06:27:30 -0800

First I would like to say, that I'm quite amazed. You even guessed table names right! 
:) I did not expect such an in-depth analysis in such a short time. Thanks, Tomasz!


> -----Original Message-----
> From: Tomasz Myrta [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, January 14, 2003 11:51 AM
> To: Tambet Matiisen
> Cc: [EMAIL PROTECTED]
> Subject: Re: [SQL] full join in view
> 
> 
> Tambet Matiisen wrote:
> 
> > You are right. After disabling seq_scan, it uses indexes 
> just as you 
> > described. Unfortunately my view happens to use subquery:
> 
> Don't disable seq_scan - sometimes it is better than indexscan.

I just did it for testing and for one session. I never disable it on production server.

...
> 
> There is one more problem in your query - coalesce, which possibly 
> disables any indexing in your view. Try to rewrite your view 
> - subquery 
> shouldn't return dor_kst_id and dor_mat_id null.
> 
Coalesce did not pose any problems. Unless I tried to filter using one of the 
coalesce-fields, which does not use indeces of course.

> Is dor_kst_id the same as kdt_kst_id and as mat_id? After 
> some database 
> practicing I found, that using the same name in all tables is 
> much more 
> comfortably
> 
This way I can refer most columns without prefixing them with table alias. But it's 
anyway good habit to use table aliases, so this is not that important. I think in next 
project I try it in your way.

> For each material (materjalid) and koostud (koostud) you want to find 
> some current value (koostude_detaild) and compare it to some sum 
> (documentid...)?
> I'm not sure if I understand well your view, but here is my 
> version of 
> this view - without subquery:
> 

I tried to save few scans by not including "koostud" and "materjalid" in my original 
query. Based on yours, I created a new version:

CREATE OR REPLACE VIEW v_tegelikud_kulud2 AS
SELECT
    kst.kst_id,
    mat.mat_id,
    max(kdt.detaili_nr) AS detaili_nr,
    max(kdt.arv) AS arv,
    max(kdt.kulu) AS kulu,
    max(kdt.yhik) AS yhik,
    max(kdt.koefitsent) AS koefitsent,
    max(kdt.eeldatav_hind) AS eeldatav_hind,
    sum(dor.kogus * dor.koefitsent::numeric) AS kogus,
    sum(dor.kokku) AS kokku
FROM koostud kst
CROSS JOIN materjalid mat
LEFT JOIN koostude_detailid kdt
    ON kst.kst_id = kdt.kdt_kst_id AND mat.mat_id = kdt.kdt_mat_id
LEFT JOIN dokumentide_read dor
    ON kst.kst_id = dor.dor_kst_id AND mat.mat_id = dor.dor_mat_id AND EXISTS
        (
        SELECT 1
        FROM dokumendid dok 
        WHERE dor.dor_dok_id = dok.dok_id AND dok.tyyp = 30 AND dok.kinnitaja IS NOT 
NULL
        )
WHERE kdt.kdt_id IS NOT NULL AND dor.dor_id IS NOT NULL
GROUP BY kst.kst_id, mat.mat_id;

I do a cross join between "koostud" and "materjalid", because "FROM koostud kst, 
materjalid mat" gave me syntax errors. Also I had to move "dokumendid" table to EXISTS 
subquery, to get equivalent results with the original query. Just LEFT JOIN-ing it is 
not enough and subquery troubled optimizer. There is also a WHERE condition to show 
only those materials, that appear in one of the tables "koostude_detailid" or 
"dokumentide_read". Here is the execution plan:

explain select count(1) from v_tegelikud_kulud2 where kst_id = 1125;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=92.60..92.60 rows=1 width=107)
   ->  Subquery Scan v_tegelikud_kulud2  (cost=69.58..92.58 rows=8 width=107)
         ->  Aggregate  (cost=69.58..92.58 rows=8 width=107)
               ->  Group  (cost=69.58..91.00 rows=79 width=107)
                     ->  Merge Join  (cost=69.58..90.61 rows=79 width=107)
                           Merge Cond: (("outer".kst_id = "inner".dor_kst_id) AND 
("outer".mat_id = "inner".dor_mat_id))
                           Join Filter: (subplan)
                           Filter: ("inner".dor_id IS NOT NULL)
                           ->  Merge Join  (cost=17.54..18.52 rows=79 width=66)
                                 Merge Cond: (("outer".kst_id = "inner".kdt_kst_id) 
AND ("outer".mat_id = "inner".kdt_mat_id))
                                 Filter: ("inner".kdt_id IS NOT NULL)
                                 ->  Sort  (cost=10.56..10.76 rows=79 width=8)
                                       Sort Key: kst.kst_id, mat.mat_id
                                       ->  Nested Loop  (cost=0.00..8.07 rows=79 
width=8)
                                             ->  Index Scan using kst_pk on koostud 
kst  (cost=0.00..4.49 rows=1 width=4)
                                                   Index Cond: (kst_id = 1125)
                                             ->  Seq Scan on materjalid mat  
(cost=0.00..2.79 rows=79 width=4)
                                 ->  Sort  (cost=6.98..7.27 rows=113 width=58)
                                       Sort Key: kdt.kdt_kst_id, kdt.kdt_mat_id
                                       ->  Seq Scan on koostude_detailid kdt  
(cost=0.00..3.13 rows=113 width=58)
                           ->  Sort  (cost=52.03..53.93 rows=761 width=41)
                                 Sort Key: dor.dor_kst_id, dor.dor_mat_id
                                 ->  Seq Scan on dokumentide_read dor  
(cost=0.00..15.61 rows=761 width=41)
                           SubPlan
                             ->  Index Scan using dok_pk on dokumendid dok  
(cost=0.00..3.47 rows=1 width=0)
                                   Index Cond: ($0 = dok_id)
                                   Filter: ((tyyp = 30) AND (kinnitaja IS NOT NULL))
(27 rows)

But there are still few things that worry me:
1. Cross join between koostud and materjalid. Table "materjalid" may have up to 10000 
rows and only 20-30 of them are actually needed.
2. Indeces on "koostude_detailid" and "dokumentide_read" are not used. Probably my 
tables do not contain enough rows. Maybe I should generate more test data first.
3. The cost of this query is twice as big, as my original query. It seems to me, that 
SubPlan is causing this. I tried to move it to subquery, but then the optimizer chose 
a totally different execution plan and seemingly was not able to use indeces of 
"dokumentide_read" table. The version with subquery:

CREATE OR REPLACE VIEW v_tegelikud_kulud3 AS
SELECT
    kst.kst_id,
    mat.mat_id,
    max(kdt.detaili_nr) AS detaili_nr,
    max(kdt.arv) AS arv,
    max(kdt.kulu) AS kulu,
    max(kdt.yhik) AS yhik,
    max(kdt.koefitsent) AS koefitsent,
    max(kdt.eeldatav_hind) AS eeldatav_hind,
    sum(dor.kogus * dor.koefitsent::numeric) AS kogus,
    sum(dor.kokku) AS kokku
FROM koostud kst
CROSS JOIN materjalid mat
LEFT JOIN koostude_detailid kdt
    ON kst.kst_id = kdt.kdt_kst_id AND mat.mat_id = kdt.kdt_mat_id
LEFT JOIN 
    (
    SELECT dor.*
    FROM dokumentide_read dor
    JOIN dokumendid dok ON dor.dor_dok_id = dok.dok_id 
    WHERE dok.tyyp = 30 AND dok.kinnitaja IS NOT NULL
    ) dor
    ON kst.kst_id = dor.dor_kst_id AND mat.mat_id = dor.dor_mat_id
WHERE kdt.kdt_id IS NOT NULL AND dor.dor_id IS NOT NULL
GROUP BY kst.kst_id, mat.mat_id;

And execution plan:

hekotek=# explain select count(1) from v_tegelikud_kulud3 where kst_id = 1125;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=56.52..56.52 rows=1 width=111)
   ->  Subquery Scan v_tegelikud_kulud3  (cost=53.31..56.50 rows=8 width=111)
         ->  Aggregate  (cost=53.31..56.50 rows=8 width=111)
               ->  Group  (cost=53.31..54.92 rows=79 width=111)
                     ->  Merge Join  (cost=53.31..54.53 rows=79 width=111)
                           Merge Cond: (("outer".kst_id = "inner".dor_kst_id) AND 
("outer".mat_id = "inner".dor_mat_id))
                           Filter: ("inner".dor_id IS NOT NULL)
                           ->  Merge Join  (cost=17.53..18.50 rows=79 width=66)
                                 Merge Cond: (("outer".kst_id = "inner".kdt_kst_id) 
AND ("outer".mat_id = "inner".kdt_mat_id))
                                 Filter: ("inner".kdt_id IS NOT NULL)
                                 ->  Sort  (cost=10.54..10.74 rows=79 width=8)
                                       Sort Key: kst.kst_id, mat.mat_id
                                       ->  Nested Loop  (cost=0.00..8.05 rows=79 
width=8)
                                             ->  Index Scan using kst_pk on koostud 
kst  (cost=0.00..4.47 rows=1 width=4)
                                                   Index Cond: (kst_id = 1125)
                                             ->  Seq Scan on materjalid mat  
(cost=0.00..2.79 rows=79 width=4)
                                 ->  Sort  (cost=6.98..7.27 rows=113 width=58)
                                       Sort Key: kdt.kdt_kst_id, kdt.kdt_mat_id
                                       ->  Seq Scan on koostude_detailid kdt  
(cost=0.00..3.13 rows=113 width=58)
                           ->  Sort  (cost=35.78..35.99 rows=84 width=45)
                                 Sort Key: dor.dor_kst_id, dor.dor_mat_id
                                 ->  Merge Join  (cost=9.04..33.09 rows=84 width=45)
                                       Merge Cond: ("outer".dor_dok_id = 
"inner".dok_id)
                                       ->  Index Scan using dor_dok_fk_i on 
dokumentide_read dor  (cost=0.00..20.91 rows=761 width=41)
                                       ->  Sort  (cost=9.04..9.13 rows=36 width=4)
                                             Sort Key: dok.dok_id
                                             ->  Seq Scan on dokumendid dok  
(cost=0.00..8.10 rows=36 width=4)
                                                   Filter: ((tyyp = 30) AND (kinnitaja 
IS NOT NULL))
(28 rows)

Uh.. Mails are getting very lengthy.

  Tambet

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [SQL] full join in view

Reply via email to