Re: [HACKERS] slow count in window query
2009/7/18 Hitoshi Harada umi.tan...@gmail.com: If I understand exlain analyze correctly and it tells us the fact, WindowAgg without ORDER BY clause gets unreasonably slow. Let me see. I haven't determined the difference between with and without ORDER BY clause in OVER(), but I took a benchmark that throws an interesting result. $ bin/psql regression -c 'explain analyze select count(*) over() from x' QUERY PLAN WindowAgg (cost=0.00..2741.00 rows=10 width=0) (actual time=3725.294..4559 .828 rows=10 loops=1) - Seq Scan on x (cost=0.00..1491.00 rows=10 width=0) (actual time=0.11 2..310.349 rows=10 loops=1) Total runtime: 4811.115 ms (3 rows) The query is quite slow because profiling hook function calls gettimeofday() each time. And here's the result that counted up eval_windowaggregate() call and its children functions. Elapse time is in second and it is subtracted with total gettimeofday() overhead. eval_windowaggregates: Count 10 Elapse 0.588426 Address |Name |Count |Elapse(Total) 0x8204067|initialize_windowaggregate| 1| 0.000277 0x8204d4a|spool_tuples |12| 0.620092 0x83dcd08|tuplestore_select_read_pointer|11| 0.011080 0x83dda2f|tuplestore_gettupleslot |11| 0.049005 0x8204fdd|row_is_in_frame |10| 0.014978 0x8204168|advance_windowaggregate |10| 0.025675 0x81ead8a|ExecClearTuple|10| 0.022105 0x8204462|finalize_windowaggregate | 1| 0.15 0x8204120|MemoryContextSwitchTo | 2| 0.00 spool_tuples() is dominant in eval_windowaggregates(). I think it is not needed if the query contains only simple aggregate like count(*) OVER () but currently we copy all the rows from the source table to tuplestore. Even if it fits in memory, the copy operation costs too much. I am thinking about how to avoid unnecessary copy overhead... Regards, --- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
2009/7/17 Pavel Stehule pavel.steh...@gmail.com: Hello look on: postgres=# explain select count(*) over () from x; QUERY PLAN - WindowAgg (cost=0.00..265.00 rows=1 width=0) - Seq Scan on x (cost=0.00..140.00 rows=1 width=0) (2 rows) Time: 1,473 ms postgres=# explain select count(*) over (order by a) from x; QUERY PLAN WindowAgg (cost=0.00..556.25 rows=1 width=4) - Index Scan using gg on x (cost=0.00..406.25 rows=1 width=4) (2 rows) but query1: 160ms query2: 72ms Well, how about select a from x order by a? I wonder if index scan affects more than windowagg. -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
Pavel Stehule pavel.steh...@gmail.com wrote: postgres=# explain select count(*) over () from x; WindowAgg (cost=0.00..265.00 rows=1 width=0) - Seq Scan on x (cost=0.00..140.00 rows=1 width=0) postgres=# explain select count(*) over (order by a) from x; WindowAgg (cost=0.00..556.25 rows=1 width=4) - Index Scan using gg on x (cost=0.00..406.25 rows=1 width=4) query1: 160ms query2: 72ms EXPLAIN ANALYZE is more telling than just EXPLAIN. Did you run both several times or flush caches carefully between the runs to eliminate caching effects? Is it possible that there are a lot of dead rows in the table (from UPDATEs or DELETEs), and the table has been vacuumed? (Output from VACUUM VERBOSE on the table would show that.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
2009/7/17 Kevin Grittner kevin.gritt...@wicourts.gov: Pavel Stehule pavel.steh...@gmail.com wrote: postgres=# explain select count(*) over () from x; WindowAgg (cost=0.00..265.00 rows=1 width=0) - Seq Scan on x (cost=0.00..140.00 rows=1 width=0) postgres=# explain select count(*) over (order by a) from x; WindowAgg (cost=0.00..556.25 rows=1 width=4) - Index Scan using gg on x (cost=0.00..406.25 rows=1 width=4) query1: 160ms query2: 72ms EXPLAIN ANALYZE is more telling than just EXPLAIN. Query1 QUERY PLAN - Aggregate (cost=931.50..931.51 rows=1 width=4) (actual time=274.423..274.425 rows=1 loops=1) - Subquery Scan p (cost=0.00..931.25 rows=100 width=4) (actual time=220.220..274.388 rows=2 loops=1) Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2))) - WindowAgg (cost=0.00..681.25 rows=1 width=4) (actual time=120.622..247.618 rows=1 loops=1) - WindowAgg (cost=0.00..556.25 rows=1 width=4) (actual time=0.088..89.848 rows=1 loops=1) - Index Scan using gg on x (cost=0.00..406.25 rows=1 width=4) (actual time=0.066..33.962 rows=1 loops Total runtime: 274.934 ms (7 rows) query2: postgres=# explain analyze select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo from x) s where hi in (lo-1,lo+1); QUERY PLAN - Aggregate (cost=1595.89..1595.90 rows=1 width=4) (actual time=215.101..215.103 rows=1 loops=1) - Subquery Scan s (cost=1220.63..1595.63 rows=100 width=4) (actual time=175.159..215.073 rows=1 loops=1) Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1))) - WindowAgg (cost=1220.63..1395.63 rows=1 width=4) (actual time=136.985..191.231 rows=1 loops=1) - Sort (cost=1220.63..1245.63 rows=1 width=4) (actual time=136.970..151.905 rows=1 loops=1) Sort Key: x.a Sort Method: quicksort Memory: 686kB - WindowAgg (cost=0.00..556.25 rows=1 width=4) (actual time=0.078..106.927 rows=1 loops=1) - Index Scan using gg on x (cost=0.00..406.25 rows=1 width=4) (actual time=0.058..33.594 rows=1 Total runtime: 215.845 ms (10 rows) Did you run both several times or flush caches carefully between the runs to eliminate caching effects? yes, - in both variants data was read from cache. Is it possible that there are a lot of dead rows in the table (from UPDATEs or DELETEs), and the table has been vacuumed? (Output from VACUUM VERBOSE on the table would show that.) table was filled with random numbers and analyzed - you can simple check it - look on begin of the thread. This table wasn't updated. Pavel -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
2009/7/17 Hitoshi Harada umi.tan...@gmail.com: 2009/7/17 Pavel Stehule pavel.steh...@gmail.com: Hello look on: postgres=# explain select count(*) over () from x; QUERY PLAN - WindowAgg (cost=0.00..265.00 rows=1 width=0) - Seq Scan on x (cost=0.00..140.00 rows=1 width=0) (2 rows) Time: 1,473 ms postgres=# explain select count(*) over (order by a) from x; QUERY PLAN WindowAgg (cost=0.00..556.25 rows=1 width=4) - Index Scan using gg on x (cost=0.00..406.25 rows=1 width=4) (2 rows) but query1: 160ms query2: 72ms Well, how about select a from x order by a? I wonder if index scan affects more than windowagg. select a from x - 42ms select a from x order by a - 50ms all data are from cache. -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
Pavel Stehule pavel.steh...@gmail.com wrote: table was filled with random numbers and analyzed - you can simple check it - look on begin of the thread. This table wasn't updated. Confirmed. The ORDER BY consistently speeds up the query. Odd Sort speed varied based on random sequence generated, but typical plan and timings: test=# explain analyze select count(*) over () from x; WindowAgg (cost=0.00..229.00 rows=1 width=0) (actual time=32.435..97.448 rows=1 loops=1) - Seq Scan on x (cost=0.00..104.00 rows=1 width=0) (actual time=0.007..14.818 rows=1 loops=1) Total runtime: 112.526 ms test=# explain analyze select count(*) over (order by a) from x; WindowAgg (cost=768.39..943.39 rows=1 width=4) (actual time=34.982..87.803 rows=1 loops=1) - Sort (cost=768.39..793.39 rows=1 width=4) (actual time=34.962..49.533 rows=1 loops=1) Sort Key: a Sort Method: quicksort Memory: 491kB - Seq Scan on x (cost=0.00..104.00 rows=1 width=4) (actual time=0.006..14.682 rows=1 loops=1) Total runtime: 102.023 ms -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
2009/7/18 Kevin Grittner kevin.gritt...@wicourts.gov: Pavel Stehule pavel.steh...@gmail.com wrote: table was filled with random numbers and analyzed - you can simple check it - look on begin of the thread. This table wasn't updated. Confirmed. The ORDER BY consistently speeds up the query. Odd Sort speed varied based on random sequence generated, but typical plan and timings: Kevin's result is quite odd. I confirmed that using IndexScan looked fater in Pavel's result but yours is with Sort node. I found that those results are seen in relatively small set. I increased the source table up to 10 rows and the OVER (ORDER BY a) case got slower. What really suprised me is in any case without ORDER BY clause in the window, WindowAgg node starts quite later than the lower node finishes. test=# explain analyze select count(*) over () from x; WindowAgg (cost=0.00..229.00 rows=1 width=0) (actual time=32.435..97.448 rows=1 loops=1) - Seq Scan on x (cost=0.00..104.00 rows=1 width=0) (actual time=0.007..14.818 rows=1 loops=1) Total runtime: 112.526 ms I had thought WindowAgg actual time would be 14.xxx ... 97.448 but actually 32.435 97.448. ORDER BY case returns the first result as soon as underneath Sort (or IndexScan) returns the first (actually the second), because window frame has only a row. But even the frame contains all the row (i.e. OVER() case) can return the first row not so later than the underneath node returns the last. If I understand exlain analyze correctly and it tells us the fact, WindowAgg without ORDER BY clause gets unreasonably slow. Let me see. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehulepavel.steh...@gmail.com wrote: postgres=# select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; How does this compare to the plain non-windowing SQL implementation: select a from x order by a offset (select trunc(count(*)/2) from x) limit 1 (except that that only works if count(*) is odd). Interestingly finding the median is actually O(n) using Quickselect. Maybe we should provide a C implementation of quickselect as a window function. I'm not sure how to wedge in the concept that the sort is unnecessary even though the ORDER BY is specified though. I'm also not sure how to handle this if the set has to be spooled to disk. Quicksort and Quickselect do a lot of scans throught he data and wouldn't perform well on disk. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
I'm also not sure how to handle this if the set has to be spooled to disk. Quicksort and Quickselect do a lot of scans throught he data and wouldn't perform well on disk. I thing, so problem is in aggregate func used as window func - or some missing optimalisation. when I replaced count(*) over () by subselect (SELECT count(*) FROM ...) then I got expected speed. Pavel -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
2009/7/16 Greg Stark gsst...@mit.edu: On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehulepavel.steh...@gmail.com wrote: postgres=# select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; How does this compare to the plain non-windowing SQL implementation: select a from x order by a offset (select trunc(count(*)/2) from x) limit 1 (except that that only works if count(*) is odd). Interestingly finding the median is actually O(n) using Quickselect. Maybe we should provide a C implementation of quickselect as a window function. I'm not sure how to wedge in the concept that the sort is unnecessary even though the ORDER BY is specified though. median() should be aggregate, not window function, shouldn't it? I'm also not sure how to handle this if the set has to be spooled to disk. Quicksort and Quickselect do a lot of scans throught he data and wouldn't perform well on disk. The WindowAgg spools rows into the tuplestore, which holds the data in memory as far as it fits in. Do you have any idea how it stores millons of millions of rows without tuplestore? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
2009/7/16 Pavel Stehule pavel.steh...@gmail.com: I'm also not sure how to handle this if the set has to be spooled to disk. Quicksort and Quickselect do a lot of scans throught he data and wouldn't perform well on disk. I thing, so problem is in aggregate func used as window func - or some missing optimalisation. when I replaced count(*) over () by subselect (SELECT count(*) FROM ...) then I got expected speed. WindowAgg always spools its input in the buffer though (in your case) it throws away row by row, so compared with pure aggregate it has overhead. I think this is reasonable approach for large data situation and different type of window. But yes, we must improve the current model. 1) There should be some kind of lightweight approach for such small-data/simple-window situations. 2) tuplestore_puttupleslot() seems to me heavy (copy, check, etc) even if the data fits in the memory by triming rows. We want to have more flexible temporary storage on the fly. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
2009/7/16 Hitoshi Harada umi.tan...@gmail.com: 2009/7/16 Greg Stark gsst...@mit.edu: On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehulepavel.steh...@gmail.com wrote: postgres=# select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; How does this compare to the plain non-windowing SQL implementation: select a from x order by a offset (select trunc(count(*)/2) from x) limit 1 (except that that only works if count(*) is odd). Interestingly finding the median is actually O(n) using Quickselect. Maybe we should provide a C implementation of quickselect as a window function. I'm not sure how to wedge in the concept that the sort is unnecessary even though the ORDER BY is specified though. median() should be aggregate, not window function, shouldn't it? yes - the core of my topic is significant slowness query, that use window functions, when aggregate function was used. This case could be simply optimized. This case isn't important for me. Simply I played with w.f. and I found Celko's query - and I was surprised, because this query was faster, then other - I expected some else. I'm also not sure how to handle this if the set has to be spooled to disk. Quicksort and Quickselect do a lot of scans throught he data and wouldn't perform well on disk. The WindowAgg spools rows into the tuplestore, which holds the data in memory as far as it fits in. Do you have any idea how it stores millons of millions of rows without tuplestore? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow count in window query
Hello look on: postgres=# explain select count(*) over () from x; QUERY PLAN - WindowAgg (cost=0.00..265.00 rows=1 width=0) - Seq Scan on x (cost=0.00..140.00 rows=1 width=0) (2 rows) Time: 1,473 ms postgres=# explain select count(*) over (order by a) from x; QUERY PLAN WindowAgg (cost=0.00..556.25 rows=1 width=4) - Index Scan using gg on x (cost=0.00..406.25 rows=1 width=4) (2 rows) but query1: 160ms query2: 72ms regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] slow count in window query
Hello, I did some test - median via window function - I found probably some bad optimised code. I found two methods - Celko and Itzik Ben-Gan. Ben-Gan methoud should to be faster - there is one sort less, but in practice - it is 2 times slower. create table x(a integer); insert into x select (random()*1)::int from generate_series(1,1); Celko method: postgres=# explain select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo, from x) s where hi in (lo-1,lo+1); QUERY PLAN - Aggregate (cost=2144.02..2144.03 rows=1 width=4) - Subquery Scan s (cost=1643.77..2143.77 rows=100 width=4) Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1))) - WindowAgg (cost=1643.77..1943.77 rows=1 width=4) - WindowAgg (cost=1643.77..1818.77 rows=1 width=4) - Sort (cost=1643.77..1668.77 rows=1 width=4) Sort Key: x.a - WindowAgg (cost=804.39..979.39 rows=1 width=4) - Sort (cost=804.39..829.39 rows=1 width=4) Sort Key: x.a - Seq Scan on x (cost=0.00..140.00 rows=1 width=4) (11 rows) Ben-Gan: postgres=# explain select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; QUERY PLAN - Aggregate (cost=1354.64..1354.65 rows=1 width=4) - Subquery Scan p (cost=804.39..1354.39 rows=100 width=4) Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2))) - WindowAgg (cost=804.39..1104.39 rows=1 width=4) - WindowAgg (cost=804.39..979.39 rows=1 width=4) - Sort (cost=804.39..829.39 rows=1 width=4) Sort Key: x.a - Seq Scan on x (cost=0.00..140.00 rows=1 width=4) (8 rows) but postgres=# select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; avg --- 5027. (1 row) Time: 179,310 ms postgres=# select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo from x) s where hi in (lo-1,lo+1); avg --- 5027. (1 row) Time: 78,791 ms When I checked diff, I found, so the problem is count() function. count(*) over () is very slow. - maybe so this is standard aggregate? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slow count(*)
On Wed, Jan 2, 2008 at 9:29 AM, in message [EMAIL PROTECTED], Abraham, Danny [EMAIL PROTECTED] wrote: We are looking for a patch that will help us count using the indexes. As others have mentioned, that's not currently possible for a count of all rows in a table, because there can be many versions of a row under PostgreSQL's MVCC techniques, and the row must currently be visited to determine whether it is visible in the context of your database transaction. Our product is about 20 times slower on Postgres compared to MS SQL Server. Any ideas? Again, it is best to show a particular example of a problem, because you might be making a bad assumption about the cause of your slowness. If you don't understand MVCC and the need for maintenance, you might have table bloat which could be the issue. Also, always give the exact version of PostgreSQL, the OS, and a description of the hardware. If you really are doing proper maintenance, and you don't need exact counts, you might be able to use the approximation stored in the system tables: cc= \timing Timing is on. cc= select count(*) from Party; count 135093 (1 row) Time: 48.626 ms cc= select reltuples from pg_class where relname = 'Party'; reltuples --- 135091 (1 row) Time: 9.799 ms -Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Slow count(*)
Kevin Grittner wrote: If you really are doing proper maintenance, and you don't need exact counts, you might be able to use the approximation stored in the system tables: Also, if you're using count(*) as an existance test (common in Mysql code), it's better to use exists instead. Using a table in my system, I see: proddb= explain analyze select count(*) from instrument_listings where update_date is null and delist_date is null; QUERY PLAN --- Aggregate (cost=18385.45..18385.46 rows=1 width=0) (actual time=897.799..897.801 rows=1 loops=1) - Seq Scan on instrument_listings (cost=0.00..17973.43 rows=164807 width=0) (actual time=0.018..634.197 rows=146122 loops=1) Filter: ((update_date IS NULL) AND (delist_date IS NULL)) Total runtime: 897.846 ms (4 rows) Time: 898.478 ms proddb= explain analyze select true where exists(select 1 from instrument_listings where update_date is null and delist_date is null limit 1); QUERY PLAN Result (cost=0.11..0.12 rows=1 width=0) (actual time=0.026..0.028 rows=1 loops=1) One-Time Filter: $0 InitPlan - Limit (cost=0.00..0.11 rows=1 width=0) (actual time=0.022..0.022 rows=1 loops=1) - Seq Scan on instrument_listings (cost=0.00..17973.43 rows=164807 width=0) (actual time=0.017..0.017 rows=1 loops=1) Filter: ((update_date IS NULL) AND (delist_date IS NULL)) Total runtime: 0.063 ms (7 rows) Time: 0.768 ms proddb= The exists version is over 1000x faster (and the only reason it's not more studly is that I'm working on the table as we speak, so it's all in memory). As a general rule in postgres, don't do count(*) unless you really mean it. Brian ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Slow count(*)
Hi, We are looking for a patch that will help us count using the indexes. Our product is about 20 times slower on Postgres compared to MS SQL Server. Any ideas? Danny Abraham BMC Software CTMD Business Unit 972-52-4286-513 [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Slow count(*)
On 02/01/2008, Abraham, Danny [EMAIL PROTECTED] wrote: Hi, We are looking for a patch that will help us count using the indexes. Our product is about 20 times slower on Postgres compared to MS SQL Server. Any ideas? There isn't any similar patch and will not be. Use materialized views or similar techniques. Are you sure, so all your problems are only in SELECT COUNT(*)? Check, please, all slow queries. Regards Pavel Stehule Danny Abraham BMC Software CTMD Business Unit 972-52-4286-513 [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Slow count(*)
am Wed, dem 02.01.2008, um 9:29:24 -0600 mailte Abraham, Danny folgendes: Hi, We are looking for a patch that will help us count using the indexes. Our product is about 20 times slower on Postgres compared to MS SQL Server. Any ideas? Please show us your SQL and the execution plan (EXPLAIN or, better, EXPLAIN ANALYSE) and read our FAQ. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: - Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Slow count(*)
On Wed, Jan 02, 2008 at 09:29:24AM -0600, Abraham, Danny wrote: We are looking for a patch that will help us count using the indexes. Is this for SELECT count(*) FROM table; or SELECT count(1) FROM table WHERE. . . The latter _will_ use an index, if the index is correct, the statistics are right, and the index selectivity is worth the cost of reading the index. The former will not use an index at all, because the answer depends on visibility, and you can't know that without reading the table. If you're counting how many rows are in the table (for, for instance, display purposes), you probably need to do something else. Our product is about 20 times slower on Postgres compared to MS SQL Server. Any ideas? Not without the queries, the EXPLAIN ANALYZE plans, and some information about the database. A ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Slow count(*)
On Wed, 2008-01-02 at 09:29 -0600, Abraham, Danny wrote: Our product is about 20 times slower on Postgres compared to MS SQL Server. If you want to have a cross-platform product then you must consider how to access multiple systems both accurately and quickly. Not much point catering for the different SQL dialects and then ignoring the performance differences. All products are not the same; you will find many advantages with Postgres. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] slow count() was: tsearch2 poor performance
On Fri, 1 Oct 2004, Kris Kiger wrote: Hey all, its me again. If I do not do a count(product_id) on my tsearch2 queries, its actually really fast, for example; Hmm, I also really want to know what's the difference ? Postgresql 8.0beta3 on Linux 2.4.25 tsearchd=# explain analyze select body from txt where fts_index @@ to_tsquery('oil') limit 1000; QUERY PLAN Limit (cost=0.00..4027.67 rows=1000 width=315) (actual time=0.053..14.662 rows=1000 loops=1) - Index Scan using fts_idx on txt (cost=0.00..12083.02 rows=3000 width=315) (actual time=0.049..12.552 rows=1000 loops=1) Index Cond: (fts_index @@ '\'oil\''::tsquery) Total runtime: 15.848 ms (4 rows) tsearchd=# explain analyze select count(body) from txt where fts_index @@ to_tsquery('oil') limit 1000; Didn't get result after 10 minutes :( Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] slow count() was: tsearch2 poor performance
Hey all, its me again. If I do not do a count(product_id) on my tsearch2 queries, its actually really fast, for example; Hmm, I also really want to know what's the difference ? Postgresql 8.0beta3 on Linux 2.4.25 tsearchd=# explain analyze select body from txt where fts_index @@ to_tsquery('oil') limit 1000; QUERY PLAN --- - Limit (cost=0.00..4027.67 rows=1000 width=315) (actual time=0.053..14.662 rows=1000 loops=1) - Index Scan using fts_idx on txt (cost=0.00..12083.02 rows=3000 width=315) (actual time=0.049..12.552 rows=1000 loops=1) Index Cond: (fts_index @@ '\'oil\''::tsquery) Total runtime: 15.848 ms (4 rows) tsearchd=# explain analyze select count(body) from txt where fts_index @@ to_tsquery('oil') limit 1000; Didn't get result after 10 minutes :( I think you're missing what LIMIT does. In the first query, it LIMITs the return from the index scan to 1000 entries. In the second query, it LIMITs the return from the aggregate to 1000 entries. The indexscan will include all matches, send then to count(), which returns 1 row only, which LIMIT is then applied to. You could probably reach the same result with a subselect: select count(*) FROM (select body from txt where fts_index @@ to_tsquery('oil') limit 1000) At least that's how I think LIMIT works. That would certainly explain the major time difference. //Magnus ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] slow count() was: tsearch2 poor performance
Magnus On Sun, 3 Oct 2004, Magnus Hagander wrote: Hey all, its me again. If I do not do a count(product_id) on my tsearch2 queries, its actually really fast, for example; Hmm, I also really want to know what's the difference ? Postgresql 8.0beta3 on Linux 2.4.25 tsearchd=# explain analyze select body from txt where fts_index @@ to_tsquery('oil') limit 1000; QUERY PLAN --- - Limit (cost=0.00..4027.67 rows=1000 width=315) (actual time=0.053..14.662 rows=1000 loops=1) - Index Scan using fts_idx on txt (cost=0.00..12083.02 rows=3000 width=315) (actual time=0.049..12.552 rows=1000 loops=1) Index Cond: (fts_index @@ '\'oil\''::tsquery) Total runtime: 15.848 ms (4 rows) tsearchd=# explain analyze select count(body) from txt where fts_index @@ to_tsquery('oil') limit 1000; Didn't get result after 10 minutes :( I think you're missing what LIMIT does. In the first query, it LIMITs the return from the index scan to 1000 entries. In the second query, it LIMITs the return from the aggregate to 1000 entries. The indexscan will include all matches, send then to count(), which returns 1 row only, which LIMIT is then applied to. Thanks for explanation. I suspect this, so LIMIT is still hack You could probably reach the same result with a subselect: select count(*) FROM (select body from txt where fts_index @@ to_tsquery('oil') limit 1000) yes, timings now are equal ! At least that's how I think LIMIT works. That would certainly explain the major time difference. //Magnus Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]