16 квітня 2010 р. 11:31 Yeb Havinga <[email protected]> написав:
> Віталій Тимчишин wrote:
>
>> Hello.
>>
>> I have a query that performs very poor because there is a limit on join
>> column that is not applied to other columns:
>>
>> select * from company this_ left outer join company_tag this_1_ on
>> this_.id=this_1_.company_id left outer join company_measures companymea2_ on
>> this_.id=companymea2_.company_id left outer join company_descr ces3_ on
>> this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and
>> this_1_.company_id>50000000
>> order by this_.id asc limit 1000;
>>
>> (plan1.txt)
>> Total runtime: 7794.692 ms
>>
>> At the same time if I apply the limit (>50000000) to other columns in
>> query itself it works like a charm:
>>
>> select * from company this_ left outer join company_tag this_1_ on
>> this_.id=this_1_.company_id left outer join company_measures companymea2_ on
>> this_.id=companymea2_.company_id left outer join company_descr ces3_ on
>> this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and
>> this_1_.company_id>50000000
>> and companymea2_.company_id>50000000 and ces3_.company_id>50000000
>> order by this_.id asc limit 1000;
>>
>> (plan2.txt)
>> Total runtime: 27.547 ms
>>
>> I've thought and someone in this list've told me that this should be done
>> automatically.
>>
> Yes, if you have in a query a=b and b=c, then the optimizer figures out
> that a=c as well. (a,b and c are then member of the same equivalence class).
>
> However both queries are not the same, since the joins you're using are
> outer joins. In the first it's possible that records are returned for
> company records with no matching ces3_ records, the ces3_ records is null in
> that case. In the second query no NULL ces3_ information may be returned.
>
OK, but when I move limit to join condition the query is still fast:
select * from company this_ left outer join company_tag this_1_ on
this_.id=this_1_.company_id
left outer join company_measures companymea2_ on
this_.id=companymea2_.company_id and companymea2_.company_id>50000000
left outer join company_descr ces3_ on this_.id=ces3_.company_id and
ces3_.company_id>50000000
where this_1_.tag_id = 7 and this_.id>50000000
and this_1_.company_id>50000000
order by this_.id asc limit 1000;
(plan3.txt),
Total runtime: 26.327 ms
BTW: Changing slow query to inner joins do not make it fast
>
> Another thing is it seems that the number of rows guessed is far off from
> the actual number of rows, is the number 5000000 artificial or are you're
> statistics old or too small histogram/mcv's?
>
Nope, I suppose this is because of limit. If I remove the limit, the
estimations are quite correct. There are ~6 millions of row in each table.
Limit (cost=293.40..1028.60 rows=1000 width=1209) (actual time=0.128..24.751
rows=1000 loops=1)
-> Merge Left Join (cost=293.40..4197731.11 rows=5709243 width=1209)
(actual time=0.124..21.968 rows=1000 loops=1)
Merge Cond: (this_.id = (companymea2_.company_id)::bigint)
-> Merge Left Join (cost=246.59..3681230.10 rows=5709243 width=1141)
(actual time=0.099..15.284 rows=1000 loops=1)
Merge Cond: (this_.id = (ces3_.company_id)::bigint)
-> Merge Join (cost=37.87..2435536.00 rows=5709243 width=639)
(actual time=0.074..8.487 rows=1000 loops=1)
Merge Cond: (this_.id = (this_1_.company_id)::bigint)
-> Index Scan using pk_comp_m on company this_
(cost=0.00..2139324.53 rows=6886598 width=623) (actual time=0.037..1.860
rows=1054 loops=1)
Index Cond: (id > 50000000)
-> Index Scan using company_tag_company_id_idx on
company_tag this_1_ (cost=0.00..207678.85 rows=5709243 width=16) (actual
time=0.027..1.758 rows=1000 loops=1)
Index Cond: ((this_1_.company_id)::bigint > 50000000)
Filter: ((this_1_.tag_id)::bigint = 7)
-> Index Scan using cws_company_descr_unique on company_descr
ces3_ (cost=0.00..1169511.92 rows=5757068 width=502) (actual time=0.020..1.788
rows=1054 loops=1)
Index Cond: ((ces3_.company_id)::bigint > 50000000)
-> Index Scan using pk_comp_ms on company_measures companymea2_
(cost=0.00..440945.79 rows=5688885 width=68) (actual time=0.019..1.729
rows=1054 loops=1)
Index Cond: ((companymea2_.company_id)::bigint > 50000000)
Total runtime: 26.327 ms
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance