Re: [HACKERS] slow count in window query

2009-07-29 Thread Hitoshi Harada
2009/7/18 Hitoshi Harada umi.tan...@gmail.com:
 If I understand exlain analyze correctly and it tells us the fact,
 WindowAgg without ORDER BY clause gets unreasonably slow. Let me see.


I haven't determined the difference between with and without ORDER BY
clause in OVER(), but I took a benchmark that throws an interesting
result.

$ bin/psql regression -c 'explain analyze select count(*) over() from x'
   QUERY PLAN



 WindowAgg  (cost=0.00..2741.00 rows=10 width=0) (actual time=3725.294..4559
.828 rows=10 loops=1)
   -  Seq Scan on x  (cost=0.00..1491.00 rows=10 width=0) (actual time=0.11
2..310.349 rows=10 loops=1)
 Total runtime: 4811.115 ms
(3 rows)

The query is quite slow because profiling hook function calls
gettimeofday() each time. And here's the result that counted up
eval_windowaggregate() call and its children functions. Elapse time is
in second and it is subtracted with total gettimeofday() overhead.

eval_windowaggregates:
Count 10
Elapse 0.588426

Address  |Name  |Count |Elapse(Total)
0x8204067|initialize_windowaggregate| 1| 0.000277
0x8204d4a|spool_tuples  |12| 0.620092
0x83dcd08|tuplestore_select_read_pointer|11| 0.011080
0x83dda2f|tuplestore_gettupleslot   |11| 0.049005
0x8204fdd|row_is_in_frame   |10| 0.014978
0x8204168|advance_windowaggregate   |10| 0.025675
0x81ead8a|ExecClearTuple|10| 0.022105
0x8204462|finalize_windowaggregate  | 1| 0.15
0x8204120|MemoryContextSwitchTo | 2| 0.00

spool_tuples() is dominant in eval_windowaggregates(). I think it is
not needed if the query contains only simple aggregate like count(*)
OVER () but currently we copy all the rows from the source table to
tuplestore. Even if it fits in memory, the copy operation costs too
much.

I am thinking about how to avoid unnecessary copy overhead...


Regards,

---
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-17 Thread Hitoshi Harada
2009/7/17 Pavel Stehule pavel.steh...@gmail.com:
 Hello

 look on:
 postgres=# explain select count(*) over () from x;
                         QUERY PLAN
 -
  WindowAgg  (cost=0.00..265.00 rows=1 width=0)
   -  Seq Scan on x  (cost=0.00..140.00 rows=1 width=0)
 (2 rows)

 Time: 1,473 ms
 postgres=# explain select count(*) over (order by a) from x;
                               QUERY PLAN
 
  WindowAgg  (cost=0.00..556.25 rows=1 width=4)
   -  Index Scan using gg on x  (cost=0.00..406.25 rows=1 width=4)
 (2 rows)

 but
 query1: 160ms
 query2: 72ms

Well, how about select a from x order by a?
I wonder if index scan affects more than windowagg.

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-17 Thread Kevin Grittner
Pavel Stehule pavel.steh...@gmail.com wrote:
 
 postgres=# explain select count(*) over () from x;
 
  WindowAgg  (cost=0.00..265.00 rows=1 width=0)
-  Seq Scan on x  (cost=0.00..140.00 rows=1 width=0)
 
 postgres=# explain select count(*) over (order by a) from x;
 
  WindowAgg  (cost=0.00..556.25 rows=1 width=4)
-  Index Scan using gg on x  (cost=0.00..406.25 rows=1
width=4)
 
 query1: 160ms
 query2: 72ms
 
EXPLAIN ANALYZE is more telling than just EXPLAIN.
 
Did you run both several times or flush caches carefully between the
runs to eliminate caching effects?
 
Is it possible that there are a lot of dead rows in the table (from
UPDATEs or DELETEs), and the table has been vacuumed?  (Output from
VACUUM VERBOSE on the table would show that.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-17 Thread Pavel Stehule
2009/7/17 Kevin Grittner kevin.gritt...@wicourts.gov:
 Pavel Stehule pavel.steh...@gmail.com wrote:

 postgres=# explain select count(*) over () from x;

  WindowAgg  (cost=0.00..265.00 rows=1 width=0)
    -  Seq Scan on x  (cost=0.00..140.00 rows=1 width=0)

 postgres=# explain select count(*) over (order by a) from x;

  WindowAgg  (cost=0.00..556.25 rows=1 width=4)
    -  Index Scan using gg on x  (cost=0.00..406.25 rows=1
 width=4)

 query1: 160ms
 query2: 72ms

 EXPLAIN ANALYZE is more telling than just EXPLAIN.

Query1

   QUERY
PLAN
-
 Aggregate  (cost=931.50..931.51 rows=1 width=4) (actual
time=274.423..274.425 rows=1 loops=1)
   -  Subquery Scan p  (cost=0.00..931.25 rows=100 width=4) (actual
time=220.220..274.388 rows=2 loops=1)
 Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2)))
 -  WindowAgg  (cost=0.00..681.25 rows=1 width=4) (actual
time=120.622..247.618 rows=1 loops=1)
   -  WindowAgg  (cost=0.00..556.25 rows=1 width=4)
(actual time=0.088..89.848 rows=1 loops=1)
 -  Index Scan using gg on x  (cost=0.00..406.25
rows=1 width=4) (actual time=0.066..33.962 rows=1 loops
 Total runtime: 274.934 ms
(7 rows)

query2:

postgres=# explain analyze select avg(a) from (select a, row_number()
over (order by a asc) as hi, row_number() over (order by a desc) as lo
from x) s where hi in (lo-1,lo+1);

QUERY PLAN
-
 Aggregate  (cost=1595.89..1595.90 rows=1 width=4) (actual
time=215.101..215.103 rows=1 loops=1)
   -  Subquery Scan s  (cost=1220.63..1595.63 rows=100 width=4)
(actual time=175.159..215.073 rows=1 loops=1)
 Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1)))
 -  WindowAgg  (cost=1220.63..1395.63 rows=1 width=4)
(actual time=136.985..191.231 rows=1 loops=1)
   -  Sort  (cost=1220.63..1245.63 rows=1 width=4)
(actual time=136.970..151.905 rows=1 loops=1)
 Sort Key: x.a
 Sort Method:  quicksort  Memory: 686kB
 -  WindowAgg  (cost=0.00..556.25 rows=1
width=4) (actual time=0.078..106.927 rows=1 loops=1)
   -  Index Scan using gg on x
(cost=0.00..406.25 rows=1 width=4) (actual time=0.058..33.594
rows=1
 Total runtime: 215.845 ms
(10 rows)




 Did you run both several times or flush caches carefully between the
 runs to eliminate caching effects?

yes, - in both variants data was read from cache.


 Is it possible that there are a lot of dead rows in the table (from
 UPDATEs or DELETEs), and the table has been vacuumed?  (Output from
 VACUUM VERBOSE on the table would show that.)


table was filled with random numbers and analyzed - you can simple
check it - look on begin of the thread. This table wasn't updated.

Pavel



 -Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-17 Thread Pavel Stehule
2009/7/17 Hitoshi Harada umi.tan...@gmail.com:
 2009/7/17 Pavel Stehule pavel.steh...@gmail.com:
 Hello

 look on:
 postgres=# explain select count(*) over () from x;
                         QUERY PLAN
 -
  WindowAgg  (cost=0.00..265.00 rows=1 width=0)
   -  Seq Scan on x  (cost=0.00..140.00 rows=1 width=0)
 (2 rows)

 Time: 1,473 ms
 postgres=# explain select count(*) over (order by a) from x;
                               QUERY PLAN
 
  WindowAgg  (cost=0.00..556.25 rows=1 width=4)
   -  Index Scan using gg on x  (cost=0.00..406.25 rows=1 width=4)
 (2 rows)

 but
 query1: 160ms
 query2: 72ms

 Well, how about select a from x order by a?
 I wonder if index scan affects more than windowagg.


select a from x - 42ms
select a from x order by a - 50ms

all data are from cache.



 --
 Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-17 Thread Kevin Grittner
Pavel Stehule pavel.steh...@gmail.com wrote:
 
 table was filled with random numbers and analyzed - you can simple
 check it - look on begin of the thread. This table wasn't updated.
 
Confirmed.  The ORDER BY consistently speeds up the query.  Odd
 
Sort speed varied based on random sequence generated, but typical
plan and timings:
 
test=# explain analyze select count(*) over () from x;
 WindowAgg  (cost=0.00..229.00 rows=1 width=0) (actual
time=32.435..97.448 rows=1 loops=1)
   -  Seq Scan on x  (cost=0.00..104.00 rows=1 width=0) (actual
time=0.007..14.818 rows=1 loops=1)
 Total runtime: 112.526 ms

test=# explain analyze select count(*) over (order by a) from x;
 WindowAgg  (cost=768.39..943.39 rows=1 width=4) (actual
time=34.982..87.803 rows=1 loops=1)
   -  Sort  (cost=768.39..793.39 rows=1 width=4) (actual
time=34.962..49.533 rows=1 loops=1)
 Sort Key: a
 Sort Method:  quicksort  Memory: 491kB
 -  Seq Scan on x  (cost=0.00..104.00 rows=1 width=4)
(actual time=0.006..14.682 rows=1 loops=1)
 Total runtime: 102.023 ms
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-17 Thread Hitoshi Harada
2009/7/18 Kevin Grittner kevin.gritt...@wicourts.gov:
 Pavel Stehule pavel.steh...@gmail.com wrote:

 table was filled with random numbers and analyzed - you can simple
 check it - look on begin of the thread. This table wasn't updated.

 Confirmed.  The ORDER BY consistently speeds up the query.  Odd

 Sort speed varied based on random sequence generated, but typical
 plan and timings:

Kevin's result is quite odd. I confirmed that using IndexScan looked
fater in Pavel's result but yours is with Sort node.

I found that those results are seen in relatively small set. I
increased the source table up to 10 rows and the OVER (ORDER BY a)
case got slower.

What really suprised me is in any case without ORDER BY clause in the
window, WindowAgg node starts quite later than the lower node
finishes.

 test=# explain analyze select count(*) over () from x;
  WindowAgg  (cost=0.00..229.00 rows=1 width=0) (actual
 time=32.435..97.448 rows=1 loops=1)
   -  Seq Scan on x  (cost=0.00..104.00 rows=1 width=0) (actual
 time=0.007..14.818 rows=1 loops=1)
  Total runtime: 112.526 ms

I had thought WindowAgg actual time would be 14.xxx ... 97.448 but
actually 32.435 97.448. ORDER BY case returns the first result as
soon as underneath Sort (or IndexScan) returns the first (actually the
second), because window frame has only a row. But even the frame
contains all the row (i.e. OVER() case) can return the first row not
so later than the underneath node returns the last.

If I understand exlain analyze correctly and it tells us the fact,
WindowAgg without ORDER BY clause gets unreasonably slow. Let me see.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Greg Stark
On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehulepavel.steh...@gmail.com wrote:
 postgres=# select avg(a) from (select a, row_number() over (order by
 a) as r, count(*) over () as rc from x ) p where r in
 ((rc+1)/2,(rc+2)/2) ;

How does this compare to the plain non-windowing SQL implementation:

select a from x order by a offset (select trunc(count(*)/2) from x) limit 1

(except that that only works if count(*) is odd).

Interestingly finding the median is actually O(n) using Quickselect.
Maybe we should provide a C implementation of quickselect as a window
function. I'm not sure how to wedge in the concept that the sort is
unnecessary even though the ORDER BY is specified though.

I'm also not sure how to handle this if the set has to be spooled to
disk. Quicksort and Quickselect do a lot of scans throught he data and
wouldn't perform well on disk.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Pavel Stehule
 I'm also not sure how to handle this if the set has to be spooled to
 disk. Quicksort and Quickselect do a lot of scans throught he data and
 wouldn't perform well on disk.

I thing, so problem is in aggregate func used as window func - or some
missing optimalisation.

when I replaced count(*) over () by subselect (SELECT count(*) FROM
...) then I got expected speed.

Pavel


 --
 greg
 http://mit.edu/~gsstark/resume.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Hitoshi Harada
2009/7/16 Greg Stark gsst...@mit.edu:
 On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehulepavel.steh...@gmail.com 
 wrote:
 postgres=# select avg(a) from (select a, row_number() over (order by
 a) as r, count(*) over () as rc from x ) p where r in
 ((rc+1)/2,(rc+2)/2) ;

 How does this compare to the plain non-windowing SQL implementation:

 select a from x order by a offset (select trunc(count(*)/2) from x) limit 1

 (except that that only works if count(*) is odd).

 Interestingly finding the median is actually O(n) using Quickselect.
 Maybe we should provide a C implementation of quickselect as a window
 function. I'm not sure how to wedge in the concept that the sort is
 unnecessary even though the ORDER BY is specified though.

median() should be aggregate, not window function, shouldn't it?


 I'm also not sure how to handle this if the set has to be spooled to
 disk. Quicksort and Quickselect do a lot of scans throught he data and
 wouldn't perform well on disk.

The WindowAgg spools rows into the tuplestore, which holds the data in
memory as far as it fits in. Do you have any idea how it stores
millons of millions of rows without tuplestore?

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Hitoshi Harada
2009/7/16 Pavel Stehule pavel.steh...@gmail.com:
 I'm also not sure how to handle this if the set has to be spooled to
 disk. Quicksort and Quickselect do a lot of scans throught he data and
 wouldn't perform well on disk.

 I thing, so problem is in aggregate func used as window func - or some
 missing optimalisation.

 when I replaced count(*) over () by subselect (SELECT count(*) FROM
 ...) then I got expected speed.


WindowAgg always spools its input in the buffer though (in your case)
it throws away row by row, so compared with pure aggregate it has
overhead. I think this is reasonable approach for large data situation
and different type of window. But yes, we must improve the current
model.

1) There should be some kind of lightweight approach for such
small-data/simple-window situations.

2) tuplestore_puttupleslot() seems to me heavy (copy, check, etc) even
if the data fits in the memory by triming rows. We want to have more
flexible temporary storage on the fly.


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Pavel Stehule
2009/7/16 Hitoshi Harada umi.tan...@gmail.com:
 2009/7/16 Greg Stark gsst...@mit.edu:
 On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehulepavel.steh...@gmail.com 
 wrote:
 postgres=# select avg(a) from (select a, row_number() over (order by
 a) as r, count(*) over () as rc from x ) p where r in
 ((rc+1)/2,(rc+2)/2) ;

 How does this compare to the plain non-windowing SQL implementation:

 select a from x order by a offset (select trunc(count(*)/2) from x) limit 1

 (except that that only works if count(*) is odd).

 Interestingly finding the median is actually O(n) using Quickselect.
 Maybe we should provide a C implementation of quickselect as a window
 function. I'm not sure how to wedge in the concept that the sort is
 unnecessary even though the ORDER BY is specified though.

 median() should be aggregate, not window function, shouldn't it?

yes - the core of my topic is significant slowness query, that use
window functions, when aggregate function was used. This case could be
simply optimized.

This case isn't important for me. Simply I played with w.f. and I
found Celko's query - and I was surprised, because this query was
faster, then other - I expected some else.


 I'm also not sure how to handle this if the set has to be spooled to
 disk. Quicksort and Quickselect do a lot of scans throught he data and
 wouldn't perform well on disk.

 The WindowAgg spools rows into the tuplestore, which holds the data in
 memory as far as it fits in. Do you have any idea how it stores
 millons of millions of rows without tuplestore?

 Regards,


 --
 Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Pavel Stehule
Hello

look on:
postgres=# explain select count(*) over () from x;
 QUERY PLAN
-
 WindowAgg  (cost=0.00..265.00 rows=1 width=0)
   -  Seq Scan on x  (cost=0.00..140.00 rows=1 width=0)
(2 rows)

Time: 1,473 ms
postgres=# explain select count(*) over (order by a) from x;
   QUERY PLAN

 WindowAgg  (cost=0.00..556.25 rows=1 width=4)
   -  Index Scan using gg on x  (cost=0.00..406.25 rows=1 width=4)
(2 rows)

but
query1: 160ms
query2: 72ms

regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] slow count in window query

2009-07-15 Thread Pavel Stehule
Hello,

I did some test - median via window function - I found probably some
bad optimised code. I found two methods - Celko and Itzik Ben-Gan.
Ben-Gan methoud should to be faster - there is one sort less, but in
practice - it is 2 times slower.

create table x(a integer);
insert into x select (random()*1)::int from generate_series(1,1);

Celko method:
postgres=# explain select avg(a)
  from (select a, row_number() over
(order by a asc) as hi,
  row_number()
over (order by a desc) as lo,
 from x) s
 where hi in (lo-1,lo+1);
   QUERY PLAN
-
 Aggregate  (cost=2144.02..2144.03 rows=1 width=4)
   -  Subquery Scan s  (cost=1643.77..2143.77 rows=100 width=4)
 Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1)))
 -  WindowAgg  (cost=1643.77..1943.77 rows=1 width=4)
   -  WindowAgg  (cost=1643.77..1818.77 rows=1 width=4)
 -  Sort  (cost=1643.77..1668.77 rows=1 width=4)
   Sort Key: x.a
   -  WindowAgg  (cost=804.39..979.39
rows=1 width=4)
 -  Sort  (cost=804.39..829.39
rows=1 width=4)
   Sort Key: x.a
   -  Seq Scan on x
(cost=0.00..140.00 rows=1 width=4)
(11 rows)

Ben-Gan:

postgres=# explain select avg(a)
  from (select a, row_number() over
(order by a) as r,
count(*) over () as rc
   from x ) p
where r in ((rc+1)/2,(rc+2)/2) ;
 QUERY PLAN
-
 Aggregate  (cost=1354.64..1354.65 rows=1 width=4)
   -  Subquery Scan p  (cost=804.39..1354.39 rows=100 width=4)
 Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2)))
 -  WindowAgg  (cost=804.39..1104.39 rows=1 width=4)
   -  WindowAgg  (cost=804.39..979.39 rows=1 width=4)
 -  Sort  (cost=804.39..829.39 rows=1 width=4)
   Sort Key: x.a
   -  Seq Scan on x  (cost=0.00..140.00
rows=1 width=4)
(8 rows)

but
postgres=# select avg(a) from (select a, row_number() over (order by
a) as r, count(*) over () as rc from x ) p where r in
((rc+1)/2,(rc+2)/2) ;
  avg
---
 5027.
(1 row)

Time: 179,310 ms

postgres=# select avg(a) from (select a, row_number() over (order by a
asc) as hi, row_number() over (order by a desc) as lo from x) s where
hi in (lo-1,lo+1);
  avg
---
 5027.
(1 row)

Time: 78,791 ms

When I checked diff, I found, so the problem is count() function.

count(*) over () is very slow. - maybe so this is standard aggregate?

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow count(*)

2008-01-03 Thread Kevin Grittner
 On Wed, Jan 2, 2008 at  9:29 AM, in message
[EMAIL PROTECTED], Abraham,
Danny [EMAIL PROTECTED] wrote: 

 We are looking for a patch that will help us  count using the indexes.
 
As others have mentioned, that's not currently possible for a count
of all rows in a table, because there can be many versions of a row
under PostgreSQL's MVCC techniques, and the row must currently be
visited to determine whether it is visible in the context of your
database transaction.
 
 Our product is about 20 times slower on Postgres compared to MS SQL
 Server.
 
 Any ideas?
 
Again, it is best to show a particular example of a problem, because
you might be making a bad assumption about the cause of your slowness.
If you don't understand MVCC and the need for maintenance, you might
have table bloat which could be the issue.  Also, always give the
exact version of PostgreSQL, the OS, and a description of the
hardware.
 
If you really are doing proper maintenance, and you don't need exact
counts, you might be able to use the approximation stored in the
system tables:
 
cc= \timing
Timing is on.
cc= select count(*) from Party;
 count

 135093
(1 row)

Time: 48.626 ms
cc= select reltuples from pg_class where relname = 'Party';
 reltuples
---
135091
(1 row)

Time: 9.799 ms
 
-Kevin
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Slow count(*)

2008-01-03 Thread Brian Hurt

Kevin Grittner wrote:



If you really are doing proper maintenance, and you don't need exact
counts, you might be able to use the approximation stored in the
system tables:
 



Also, if you're using count(*) as an existance test (common in Mysql 
code), it's better to use exists instead.  Using a table in my system, I 
see:


proddb= explain analyze select count(*) from instrument_listings 
where update_date is null and delist_date is null;

QUERY PLAN
---
 Aggregate  (cost=18385.45..18385.46 rows=1 width=0) (actual 
time=897.799..897.801 rows=1 loops=1)
   -  Seq Scan on instrument_listings  (cost=0.00..17973.43 
rows=164807 width=0) (actual time=0.018..634.197 rows=146122 loops=1)

 Filter: ((update_date IS NULL) AND (delist_date IS NULL))
 Total runtime: 897.846 ms
(4 rows)

Time: 898.478 ms
proddb= explain analyze select true where exists(select 1 from 
instrument_listings where update_date is null and delist_date is null 
limit 1);

 QUERY PLAN

 Result  (cost=0.11..0.12 rows=1 width=0) (actual time=0.026..0.028 
rows=1 loops=1)

   One-Time Filter: $0
   InitPlan
 -  Limit  (cost=0.00..0.11 rows=1 width=0) (actual 
time=0.022..0.022 rows=1 loops=1)
   -  Seq Scan on instrument_listings  (cost=0.00..17973.43 
rows=164807 width=0) (actual time=0.017..0.017 rows=1 loops=1)

 Filter: ((update_date IS NULL) AND (delist_date IS NULL))
 Total runtime: 0.063 ms
(7 rows)

Time: 0.768 ms
proddb=



The exists version is over 1000x faster (and the only reason it's not 
more studly is that I'm working on the table as we speak, so it's all in 
memory).


As a general rule in postgres, don't do count(*) unless you really mean it.

Brian


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Slow count(*)

2008-01-02 Thread Abraham, Danny
Hi,

We are looking for a patch that will help us  count using the indexes.

Our product is about 20 times slower on Postgres compared to MS SQL
Server.

Any ideas?

Danny Abraham
BMC Software
CTMD Business Unit
972-52-4286-513
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Slow count(*)

2008-01-02 Thread Pavel Stehule
On 02/01/2008, Abraham, Danny [EMAIL PROTECTED] wrote:
 Hi,

 We are looking for a patch that will help us  count using the indexes.

 Our product is about 20 times slower on Postgres compared to MS SQL
 Server.

 Any ideas?

There isn't any similar patch and will not be.

Use materialized views or similar techniques.

Are you sure, so all your problems are only in SELECT COUNT(*)?

Check, please, all slow queries.

Regards
Pavel Stehule

 Danny Abraham
 BMC Software
 CTMD Business Unit
 972-52-4286-513
 [EMAIL PROTECTED]


 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Slow count(*)

2008-01-02 Thread A. Kretschmer
am  Wed, dem 02.01.2008, um  9:29:24 -0600 mailte Abraham, Danny folgendes:
 Hi,
 
 We are looking for a patch that will help us  count using the indexes.
 
 Our product is about 20 times slower on Postgres compared to MS SQL
 Server.
 
 Any ideas?

Please show us your SQL and the execution plan (EXPLAIN or, better,
EXPLAIN ANALYSE) and read our FAQ.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Slow count(*)

2008-01-02 Thread Andrew Sullivan
On Wed, Jan 02, 2008 at 09:29:24AM -0600, Abraham, Danny wrote:
 We are looking for a patch that will help us  count using the indexes.

Is this for 

SELECT count(*) FROM table;

or 

SELECT count(1) FROM table WHERE. . .

The latter _will_ use an index, if the index is correct, the statistics are
right, and the index selectivity is worth the cost of reading the index. 
The former will not use an index at all, because the answer depends on
visibility, and you can't know that without reading the table.  If you're
counting how many rows are in the table (for, for instance, display
purposes), you probably need to do something else.

 Our product is about 20 times slower on Postgres compared to MS SQL
 Server.
 
 Any ideas?

Not without the queries, the EXPLAIN ANALYZE plans, and some information
about the database.

A

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Slow count(*)

2008-01-02 Thread Simon Riggs
On Wed, 2008-01-02 at 09:29 -0600, Abraham, Danny wrote:

 Our product is about 20 times slower on Postgres compared to MS SQL
 Server.

If you want to have a cross-platform product then you must consider how
to access multiple systems both accurately and quickly. Not much point
catering for the different SQL dialects and then ignoring the
performance differences. All products are not the same; you will find
many advantages with Postgres.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] slow count() was: tsearch2 poor performance

2004-10-03 Thread Oleg Bartunov
On Fri, 1 Oct 2004, Kris Kiger wrote:

 Hey all, its me again.  If I do not do a count(product_id) on my
 tsearch2 queries, its actually really fast, for example;


Hmm, I also really want to know  what's the difference ?
Postgresql 8.0beta3 on Linux 2.4.25

tsearchd=# explain analyze select body from txt where fts_index @@ to_tsquery('oil') 
limit 1000;
   QUERY PLAN

 Limit  (cost=0.00..4027.67 rows=1000 width=315) (actual time=0.053..14.662 rows=1000 
loops=1)
   -  Index Scan using fts_idx on txt  (cost=0.00..12083.02 rows=3000 width=315) 
(actual time=0.049..12.552 rows=1000 loops=1)
 Index Cond: (fts_index @@ '\'oil\''::tsquery)
 Total runtime: 15.848 ms
(4 rows)

tsearchd=# explain analyze select count(body) from txt where fts_index @@ 
to_tsquery('oil') limit 1000;

Didn't get result after 10 minutes :(




Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] slow count() was: tsearch2 poor performance

2004-10-03 Thread Magnus Hagander
 Hey all, its me again.  If I do not do a count(product_id) on my
 tsearch2 queries, its actually really fast, for example;


Hmm, I also really want to know  what's the difference ?
Postgresql 8.0beta3 on Linux 2.4.25

tsearchd=# explain analyze select body from txt where 
fts_index @@ to_tsquery('oil') limit 1000;
   QUERY PLAN
---
-
 Limit  (cost=0.00..4027.67 rows=1000 width=315) (actual 
time=0.053..14.662 rows=1000 loops=1)
   -  Index Scan using fts_idx on txt  (cost=0.00..12083.02 
rows=3000 width=315) (actual time=0.049..12.552 rows=1000 loops=1)
 Index Cond: (fts_index @@ '\'oil\''::tsquery)
 Total runtime: 15.848 ms
(4 rows)

tsearchd=# explain analyze select count(body) from txt where 
fts_index @@ to_tsquery('oil') limit 1000;

Didn't get result after 10 minutes :(


I think you're missing what LIMIT does.

In the first query, it LIMITs the return from the index scan to 1000
entries.
In the second query, it LIMITs the return from the aggregate to 1000
entries. The indexscan will include all matches, send then to count(),
which returns 1 row only, which LIMIT is then applied to.

You could probably reach the same result with a subselect:
select count(*) FROM (select body from txt where fts_index @@
to_tsquery('oil') limit 1000)


At least that's how I think LIMIT works. That would certainly explain
the major time difference.

//Magnus

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] slow count() was: tsearch2 poor performance

2004-10-03 Thread Oleg Bartunov

Magnus


On Sun, 3 Oct 2004, Magnus Hagander wrote:

  Hey all, its me again.  If I do not do a count(product_id) on my
  tsearch2 queries, its actually really fast, for example;
 
 
 Hmm, I also really want to know  what's the difference ?
 Postgresql 8.0beta3 on Linux 2.4.25
 
 tsearchd=# explain analyze select body from txt where
 fts_index @@ to_tsquery('oil') limit 1000;
QUERY PLAN
 ---
 -
  Limit  (cost=0.00..4027.67 rows=1000 width=315) (actual
 time=0.053..14.662 rows=1000 loops=1)
-  Index Scan using fts_idx on txt  (cost=0.00..12083.02
 rows=3000 width=315) (actual time=0.049..12.552 rows=1000 loops=1)
  Index Cond: (fts_index @@ '\'oil\''::tsquery)
  Total runtime: 15.848 ms
 (4 rows)
 
 tsearchd=# explain analyze select count(body) from txt where
 fts_index @@ to_tsquery('oil') limit 1000;
 
 Didn't get result after 10 minutes :(
 

 I think you're missing what LIMIT does.

 In the first query, it LIMITs the return from the index scan to 1000
 entries.
 In the second query, it LIMITs the return from the aggregate to 1000
 entries. The indexscan will include all matches, send then to count(),
 which returns 1 row only, which LIMIT is then applied to.


Thanks for explanation. I suspect this, so LIMIT is still hack


 You could probably reach the same result with a subselect:
 select count(*) FROM (select body from txt where fts_index @@
 to_tsquery('oil') limit 1000)


yes, timings now are equal !


 At least that's how I think LIMIT works. That would certainly explain
 the major time difference.

 //Magnus


Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]