Re: [HACKERS] COUNT and Performance ...

2003-02-17 Thread Bruce Momjian

I didn't think pgstattuple had proper visibility checks.


---

Hans-Jürgen Schönig wrote:
 This patch adds a note to the documentation describing why the
 performance of min() and max() is slow when applied to the entire table,
 and suggesting the simple workaround most experienced Pg users
 eventually learn about (SELECT xyz ... ORDER BY xyz LIMIT 1).
 
 Any suggestions on improving the wording of this section would be
 welcome.
 
 Cheers,
 
 
 --
 
 ORDER and LIMIT work pretty fast (no seq scan).
 In special cases there can be another way to avoid seq scans:
 
 
 action=# select tuple_count from pgstattuple('t_text');
  tuple_count
 -
14203
 (1 row)
 
 action=# BEGIN;
 BEGIN
 action=# insert into t_text (suchid) VALUES ('10');
 INSERT 578606 1
 action=# select tuple_count from pgstattuple('t_text');
  tuple_count
 -
14204
 (1 row)
 
 action=# ROLLBACK;
 ROLLBACK
 action=# select tuple_count from pgstattuple('t_text');
  tuple_count
 -
14203
 (1 row)
 
 
 If people want to count ALL rows of a table. The contrib stuff is pretty 
 useful. It seems to be transaction safe.
 
 The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz):
 
 
 root@actionscouts:~# time psql action -c select tuple_count from 
 pgstattuple('t_text');
  tuple_count
 -
14203
 (1 row)
 
 
 real0m0.266s
 user0m0.030s
 sys 0m0.020s
 root@actionscouts:~# time psql action -c select count(*) from t_text
  count
 ---
  14203
 (1 row)
 
 
 real0m0.701s
 user0m0.040s
 sys 0m0.010s
 
 
 I think that this could be a good workaround for huge counts (maybe 
 millions of records) with no where clause and no joins.
 
 Hans
 
 http://kernel.cybertec.at
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] COUNT and Performance ...

2003-02-17 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 I didn't think pgstattuple had proper visibility checks.

It doesn't, see followup discussion.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] COUNT and Performance ...

2003-02-06 Thread Hans-Jürgen Schönig


But pgstattuple does do a sequential scan of the table.  You avoid a lot
of the executor's tuple-pushing and plan-node-traversing machinery that
way, but the I/O requirement is going to be exactly the same.
 


I have tried it more often so that I can be sure that everything is in 
the cache.
I thought it did some sort of stat on tables. Too bad :(.


If people want to count ALL rows of a table. The contrib stuff is pretty 
useful. It seems to be transaction safe.
   


Not entirely.  pgstattuple uses HeapTupleSatisfiesNow(), which means you
get a count of tuples that are committed good in terms of the effects of
transactions committed up to the instant each tuple is examined.  This
is in general different from what count(*) would tell you, because it
ignores snapshotting.  It'd be quite unrepeatable too, in the face of
active concurrent changes --- it's very possible for pgstattuple to
count a single row twice or not at all, if it's being concurrently
updated and the other transaction commits between the times pgstattuple
sees the old and new versions of the row.
 

Interesting. I have tried it with concurrent sessions and transactions - 
the results seemed to be right (I could not see the records inserted by 
open transactions). Too bad :(. It would have been a nice work around.


The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz



I think your test case is small enough that the whole table is resident
in memory, so this measurement only accounts for CPU time per tuple and
not any I/O.  Given the small size of pgstattuple's per-tuple loop, the
speed differential is not too surprising --- but it won't scale up to
larger tables.

Sometime it would be interesting to profile count(*) on large tables
and see exactly where the CPU time goes.  It might be possible to shave
off some of the executor overhead ...

			regards, tom lane
 


I have tried it with the largest table on my testing system.
Reducing the overhead is great :).

   Thanks a lot,

   Hans

--
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] COUNT and Performance ...

2003-02-06 Thread Arjen van der Meijden
For a more accurate view of the time used, use the \timing switch in psql.
That leaves out the overhead for forking and loading psql, connecting to 
the database and such things.

I think, that it would be even nicer if postgresql automatically choose 
to replace the count(*)-with-no-where with something similar.

Regards,

Arjen

Hans-Jürgen Schönig wrote:
This patch adds a note to the documentation describing why the
performance of min() and max() is slow when applied to the entire table,
and suggesting the simple workaround most experienced Pg users
eventually learn about (SELECT xyz ... ORDER BY xyz LIMIT 1).

Any suggestions on improving the wording of this section would be
welcome.

Cheers,


--

ORDER and LIMIT work pretty fast (no seq scan).
In special cases there can be another way to avoid seq scans:


action=# select tuple_count from pgstattuple('t_text');
 tuple_count
-
   14203
(1 row)

action=# BEGIN;
BEGIN
action=# insert into t_text (suchid) VALUES ('10');
INSERT 578606 1
action=# select tuple_count from pgstattuple('t_text');
 tuple_count
-
   14204
(1 row)

action=# ROLLBACK;
ROLLBACK
action=# select tuple_count from pgstattuple('t_text');
 tuple_count
-
   14203
(1 row)


If people want to count ALL rows of a table. The contrib stuff is pretty 
useful. It seems to be transaction safe.

The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz):


root@actionscouts:~# time psql action -c select tuple_count from 
pgstattuple('t_text');
 tuple_count
-
   14203
(1 row)


real0m0.266s
user0m0.030s
sys 0m0.020s
root@actionscouts:~# time psql action -c select count(*) from t_text
 count
---
 14203
(1 row)


real0m0.701s
user0m0.040s
sys 0m0.010s


I think that this could be a good workaround for huge counts (maybe 
millions of records) with no where clause and no joins.

Hans

http://kernel.cybertec.at


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] COUNT and Performance ...

2003-02-02 Thread Hans-Jürgen Schönig
This patch adds a note to the documentation describing why the
performance of min() and max() is slow when applied to the entire table,
and suggesting the simple workaround most experienced Pg users
eventually learn about (SELECT xyz ... ORDER BY xyz LIMIT 1).

Any suggestions on improving the wording of this section would be
welcome.

Cheers,


--

ORDER and LIMIT work pretty fast (no seq scan).
In special cases there can be another way to avoid seq scans:


action=# select tuple_count from pgstattuple('t_text');
tuple_count
-
  14203
(1 row)

action=# BEGIN;
BEGIN
action=# insert into t_text (suchid) VALUES ('10');
INSERT 578606 1
action=# select tuple_count from pgstattuple('t_text');
tuple_count
-
  14204
(1 row)

action=# ROLLBACK;
ROLLBACK
action=# select tuple_count from pgstattuple('t_text');
tuple_count
-
  14203
(1 row)


If people want to count ALL rows of a table. The contrib stuff is pretty 
useful. It seems to be transaction safe.

The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz):


root@actionscouts:~# time psql action -c select tuple_count from 
pgstattuple('t_text');
tuple_count
-
  14203
(1 row)


real0m0.266s
user0m0.030s
sys 0m0.020s
root@actionscouts:~# time psql action -c select count(*) from t_text
count
---
14203
(1 row)


real0m0.701s
user0m0.040s
sys 0m0.010s


I think that this could be a good workaround for huge counts (maybe 
millions of records) with no where clause and no joins.

   Hans

http://kernel.cybertec.at


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] COUNT and Performance ...

2003-02-02 Thread Neil Conway
On Sun, 2003-02-02 at 03:55, Hans-Jürgen Schönig wrote:
 If people want to count ALL rows of a table. The contrib stuff is pretty 
 useful. It seems to be transaction safe.

Interesting -- I didn't know about the contrib stuff. I'll update the
docs patch.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] COUNT and Performance ...

2003-02-02 Thread Neil Conway
On Sun, 2003-02-02 at 13:04, Tom Lane wrote:
 I think your test case is small enough that the whole table is resident
 in memory, so this measurement only accounts for CPU time per tuple and
 not any I/O.  Given the small size of pgstattuple's per-tuple loop, the
 speed differential is not too surprising --- but it won't scale up to
 larger tables.

Good observation.

When the entire table is in cache, pgstattuple about 4 times faster than
count(*) on my machine. When the table is too large to fit into cache,
the performance difference drops to 8% in favour of pgstattuple:

nconway=# select count(*) from big_table;
  count  
-
 8388612
(1 row)

Time: 26769.99 ms
nconway=# SELECT tuple_count FROM pgstattuple('big_table');
 tuple_count 
-
 8388612
(1 row)

Time: 24658.87 ms

Cheers,

Neil
-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster