Re: [HACKERS] Cost of sort/order by not estimated by the query planner
The table is clustered by by blog_id. So, for testing purpose, i tried an ORDER BY blog_id. limit 500 : - explain analyze SELECT * FROM _article WHERE (_article.bitfield && getbit(0)) ORDER BY _article.blog_id ASC LIMIT 500; Limit (cost=66229.90..66231.15 rows=500 width=1099) (actual time=9.368..9.580 rows=500 loops=1) -> Sort (cost=66229.90..66273.25 rows=17341 width=1099) (actual time=9.367..9.443 rows=500 loops=1) Sort Key: blog_id Sort Method: top-N heapsort Memory: 660kB -> Bitmap Heap Scan on _article (cost=138.67..65365.82 rows=17341 width=1099) (actual time=0.905..4.042 rows=6729 loops=1) Recheck Cond: (bitfield && B'1'::bit varying) -> Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.33 rows=17341 width=0) (actual time=0.772..0.772 rows=6729 loops=1) Index Cond: (bitfield && B'1'::bit varying) Total runtime: 9.824 ms Limit 5 : -- explain analyze SELECT * FROM _article WHERE (_article.bitfield && getbit(0)) ORDER BY _article.blog_id ASC LIMIT 5; Limit (cost=0.00..1419.22 rows=5 width=1099) (actual time=125076.420..280419.143 rows=5 loops=1) -> Index Scan using idx_article_blog_id on _article (cost=0.00..4922126.37 rows=17341 width=1099) (actual time=125076.419..280419.137 rows=5 loops=1) Filter: (bitfield && B'1'::bit varying) Total runtime: 280419.241 ms -- Laurent "ker2x" Laborde Sysadmin & DBA at http://www.over-blog.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost of sort/order by not estimated by the query planner
'morning ! And here is the query plan for : --- explain analyze SELECT * FROM _article WHERE (_article.bitfield && getbit(0)) ORDER BY _article.id ASC LIMIT 5; Limit (cost=0.00..2238.33 rows=5 width=1099) (actual time=17548636.326..17548837.082 rows=5 loops=1) -> Index Scan using _article_pkey on _article (cost=0.00..7762964.53 rows=17341 width=1099) (actual time=17548636.324..17548837.075 rows=5 loops=1) Filter: (bitfield && B'1'::bit varying) Total runtime: 17548837.154 ms Versus the "limit 500" query plan : --- explain analyze SELECT * FROM _article WHERE (_article.bitfield && getbit(0)) ORDER BY _article.id ASC LIMIT 500; Limit (cost=66229.90..66231.15 rows=500 width=1099) (actual time=1491.905..1492.146 rows=500 loops=1) -> Sort (cost=66229.90..66273.25 rows=17341 width=1099) (actual time=1491.904..1492.008 rows=500 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 603kB -> Bitmap Heap Scan on _article (cost=138.67..65365.82 rows=17341 width=1099) (actual time=777.489..1487.120 rows=6729 loops=1) Recheck Cond: (bitfield && B'1'::bit varying) -> Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.33 rows=17341 width=0) (actual time=769.799..769.799 rows=6729 loops=1) Index Cond: (bitfield && B'1'::bit varying) Total runtime: 1630.690 ms I will read (and try to understand) all you said yesterday and reply as soon as i can :) Thank you ! -- Laurent "ker2x" Laborde Sysadmin & DBA at http://www.over-blog.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost of sort/order by not estimated by the query planner
2009/12/1 Laurent Laborde : > The problem is in the order by, of course. > If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an > index scan. > At limit 500 (without order) it still use an index scan and it is > slightly slower. > At limit 5000 (without order) it switch to a Bitmap Index Scan + > Bitmap Heap Scan and it's slower but acceptable (5.275 ms) > > Why, with the "QUERY 2", postgresql doesn't estimate the cost of the > Sort/ORDER BY ? > Of course, by ignoring the order, both query plan are right and the > choice for thoses differents plans totally make sense. It's because the result of IndexScan is already sorted by demanded key, whereas the one of BitmapIndexScan isn't. But I'm not sure why the query lasts more than 30 minutes... Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost of sort/order by not estimated by the query planner
hummm Adding pgsql-perf :) On Mon, Nov 30, 2009 at 5:54 PM, Laurent Laborde wrote: > Friendly greetings ! > I use postgresql 8.3.6. > > here is a few info about the table i'm querying : > - > - select count(*) from _article : 17301610 > - select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729 > > > Here are both request with problems : > -- > > QUERY 1 : Very fast ! > - > > explain SELECT * > FROM _article > WHERE (_article.bitfield && getbit(0)) > ORDER BY _article.id ASC > LIMIT 500; > QUERY PLAN > - > Limit (cost=66114.13..66115.38 rows=500 width=1114) > -> Sort (cost=66114.13..66157.37 rows=17296 width=1114) > Sort Key: id > -> Bitmap Heap Scan on _article (cost=138.32..65252.29 > rows=17296 width=1114) > Recheck Cond: (bitfield && B'1'::bit varying) > -> Bitmap Index Scan on idx_article_bitfield > (cost=0.00..134.00 rows=17296 width=0) > Index Cond: (bitfield && B'1'::bit varying) > > > > > QUERY 2 : Endless ... (more than 30mn... i stopped the query) > - > > explain SELECT * > FROM _article > WHERE (_article.bitfield && getbit(0)) > ORDER BY _article.id ASC > LIMIT 5; > QUERY PLAN > - > Limit (cost=0.00..2042.87 rows=5 width=1114) > -> Index Scan using _article_pkey on _article > (cost=0.00..7066684.46 rows=17296 width=1114) > Filter: (bitfield && B'1'::bit varying) > (3 rows) > > > With LIMIT 5 and LIMIT 500, the query plan are differents. > Postgresql estimate that it can do a a simple index scan to find only 5 row. > With more than LIMIT ~400 it estimate that it's faster to do a more > complex plan. > and it make sense ! > > The problem is in the order by, of course. > If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an > index scan. > At limit 500 (without order) it still use an index scan and it is > slightly slower. > At limit 5000 (without order) it switch to a Bitmap Index Scan + > Bitmap Heap Scan and it's slower but acceptable (5.275 ms) > > Why, with the "QUERY 2", postgresql doesn't estimate the cost of the > Sort/ORDER BY ? > Of course, by ignoring the order, both query plan are right and the > choice for thoses differents plans totally make sense. > > But... if the planner would be kind enough to considerate the cost of > the order by, it would certainly choose the Bitmap Index + Bitmap Heap > scan for the limit 5. > And not an index_scan pkey ! > > I have set the statistics to 1000 for _article.bitfield, just in case > (and ran a vacuum analyze), it doesn't change anything. > > Is that a bug ? any Idea ? > > Thank you :) > > -- > Laurent "ker2x" Laborde > Sysadmin & DBA at http://www.over-blog.com/ > -- Laurent "ker2x" Laborde Sysadmin & DBA at http://www.over-blog.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cost of sort/order by not estimated by the query planner
Friendly greetings ! I use postgresql 8.3.6. here is a few info about the table i'm querying : - - select count(*) from _article : 17301610 - select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729 Here are both request with problems : -- QUERY 1 : Very fast ! - explain SELECT * FROM _article WHERE (_article.bitfield && getbit(0)) ORDER BY _article.id ASC LIMIT 500; QUERY PLAN - Limit (cost=66114.13..66115.38 rows=500 width=1114) -> Sort (cost=66114.13..66157.37 rows=17296 width=1114) Sort Key: id -> Bitmap Heap Scan on _article (cost=138.32..65252.29 rows=17296 width=1114) Recheck Cond: (bitfield && B'1'::bit varying) -> Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.00 rows=17296 width=0) Index Cond: (bitfield && B'1'::bit varying) QUERY 2 : Endless ... (more than 30mn... i stopped the query) - explain SELECT * FROM _article WHERE (_article.bitfield && getbit(0)) ORDER BY _article.id ASC LIMIT 5; QUERY PLAN - Limit (cost=0.00..2042.87 rows=5 width=1114) -> Index Scan using _article_pkey on _article (cost=0.00..7066684.46 rows=17296 width=1114) Filter: (bitfield && B'1'::bit varying) (3 rows) With LIMIT 5 and LIMIT 500, the query plan are differents. Postgresql estimate that it can do a a simple index scan to find only 5 row. With more than LIMIT ~400 it estimate that it's faster to do a more complex plan. and it make sense ! The problem is in the order by, of course. If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an index scan. At limit 500 (without order) it still use an index scan and it is slightly slower. At limit 5000 (without order) it switch to a Bitmap Index Scan + Bitmap Heap Scan and it's slower but acceptable (5.275 ms) Why, with the "QUERY 2", postgresql doesn't estimate the cost of the Sort/ORDER BY ? Of course, by ignoring the order, both query plan are right and the choice for thoses differents plans totally make sense. But... if the planner would be kind enough to considerate the cost of the order by, it would certainly choose the Bitmap Index + Bitmap Heap scan for the limit 5. And not an index_scan pkey ! I have set the statistics to 1000 for _article.bitfield, just in case (and ran a vacuum analyze), it doesn't change anything. Is that a bug ? any Idea ? Thank you :) -- Laurent "ker2x" Laborde Sysadmin & DBA at http://www.over-blog.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers