Re: [HACKERS] COUNT and Performance ...
I didn't think pgstattuple had proper visibility checks. --- Hans-Jürgen Schönig wrote: This patch adds a note to the documentation describing why the performance of min() and max() is slow when applied to the entire table, and suggesting the simple workaround most experienced Pg users eventually learn about (SELECT xyz ... ORDER BY xyz LIMIT 1). Any suggestions on improving the wording of this section would be welcome. Cheers, -- ORDER and LIMIT work pretty fast (no seq scan). In special cases there can be another way to avoid seq scans: action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) action=# BEGIN; BEGIN action=# insert into t_text (suchid) VALUES ('10'); INSERT 578606 1 action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14204 (1 row) action=# ROLLBACK; ROLLBACK action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) If people want to count ALL rows of a table. The contrib stuff is pretty useful. It seems to be transaction safe. The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz): root@actionscouts:~# time psql action -c select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) real0m0.266s user0m0.030s sys 0m0.020s root@actionscouts:~# time psql action -c select count(*) from t_text count --- 14203 (1 row) real0m0.701s user0m0.040s sys 0m0.010s I think that this could be a good workaround for huge counts (maybe millions of records) with no where clause and no joins. Hans http://kernel.cybertec.at ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COUNT and Performance ...
Bruce Momjian [EMAIL PROTECTED] writes: I didn't think pgstattuple had proper visibility checks. It doesn't, see followup discussion. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COUNT and Performance ...
But pgstattuple does do a sequential scan of the table. You avoid a lot of the executor's tuple-pushing and plan-node-traversing machinery that way, but the I/O requirement is going to be exactly the same. I have tried it more often so that I can be sure that everything is in the cache. I thought it did some sort of stat on tables. Too bad :(. If people want to count ALL rows of a table. The contrib stuff is pretty useful. It seems to be transaction safe. Not entirely. pgstattuple uses HeapTupleSatisfiesNow(), which means you get a count of tuples that are committed good in terms of the effects of transactions committed up to the instant each tuple is examined. This is in general different from what count(*) would tell you, because it ignores snapshotting. It'd be quite unrepeatable too, in the face of active concurrent changes --- it's very possible for pgstattuple to count a single row twice or not at all, if it's being concurrently updated and the other transaction commits between the times pgstattuple sees the old and new versions of the row. Interesting. I have tried it with concurrent sessions and transactions - the results seemed to be right (I could not see the records inserted by open transactions). Too bad :(. It would have been a nice work around. The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz I think your test case is small enough that the whole table is resident in memory, so this measurement only accounts for CPU time per tuple and not any I/O. Given the small size of pgstattuple's per-tuple loop, the speed differential is not too surprising --- but it won't scale up to larger tables. Sometime it would be interesting to profile count(*) on large tables and see exactly where the CPU time goes. It might be possible to shave off some of the executor overhead ... regards, tom lane I have tried it with the largest table on my testing system. Reducing the overhead is great :). Thanks a lot, Hans -- *Cybertec Geschwinde u Schoenig* Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/1/913 68 09; +43/664/233 90 75 www.postgresql.at http://www.postgresql.at, cluster.postgresql.at http://cluster.postgresql.at, www.cybertec.at http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] COUNT and Performance ...
For a more accurate view of the time used, use the \timing switch in psql. That leaves out the overhead for forking and loading psql, connecting to the database and such things. I think, that it would be even nicer if postgresql automatically choose to replace the count(*)-with-no-where with something similar. Regards, Arjen Hans-Jürgen Schönig wrote: This patch adds a note to the documentation describing why the performance of min() and max() is slow when applied to the entire table, and suggesting the simple workaround most experienced Pg users eventually learn about (SELECT xyz ... ORDER BY xyz LIMIT 1). Any suggestions on improving the wording of this section would be welcome. Cheers, -- ORDER and LIMIT work pretty fast (no seq scan). In special cases there can be another way to avoid seq scans: action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) action=# BEGIN; BEGIN action=# insert into t_text (suchid) VALUES ('10'); INSERT 578606 1 action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14204 (1 row) action=# ROLLBACK; ROLLBACK action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) If people want to count ALL rows of a table. The contrib stuff is pretty useful. It seems to be transaction safe. The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz): root@actionscouts:~# time psql action -c select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) real0m0.266s user0m0.030s sys 0m0.020s root@actionscouts:~# time psql action -c select count(*) from t_text count --- 14203 (1 row) real0m0.701s user0m0.040s sys 0m0.010s I think that this could be a good workaround for huge counts (maybe millions of records) with no where clause and no joins. Hans http://kernel.cybertec.at ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] COUNT and Performance ...
This patch adds a note to the documentation describing why the performance of min() and max() is slow when applied to the entire table, and suggesting the simple workaround most experienced Pg users eventually learn about (SELECT xyz ... ORDER BY xyz LIMIT 1). Any suggestions on improving the wording of this section would be welcome. Cheers, -- ORDER and LIMIT work pretty fast (no seq scan). In special cases there can be another way to avoid seq scans: action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) action=# BEGIN; BEGIN action=# insert into t_text (suchid) VALUES ('10'); INSERT 578606 1 action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14204 (1 row) action=# ROLLBACK; ROLLBACK action=# select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) If people want to count ALL rows of a table. The contrib stuff is pretty useful. It seems to be transaction safe. The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz): root@actionscouts:~# time psql action -c select tuple_count from pgstattuple('t_text'); tuple_count - 14203 (1 row) real0m0.266s user0m0.030s sys 0m0.020s root@actionscouts:~# time psql action -c select count(*) from t_text count --- 14203 (1 row) real0m0.701s user0m0.040s sys 0m0.010s I think that this could be a good workaround for huge counts (maybe millions of records) with no where clause and no joins. Hans http://kernel.cybertec.at ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COUNT and Performance ...
On Sun, 2003-02-02 at 03:55, Hans-Jürgen Schönig wrote: If people want to count ALL rows of a table. The contrib stuff is pretty useful. It seems to be transaction safe. Interesting -- I didn't know about the contrib stuff. I'll update the docs patch. Cheers, Neil -- Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COUNT and Performance ...
On Sun, 2003-02-02 at 13:04, Tom Lane wrote: I think your test case is small enough that the whole table is resident in memory, so this measurement only accounts for CPU time per tuple and not any I/O. Given the small size of pgstattuple's per-tuple loop, the speed differential is not too surprising --- but it won't scale up to larger tables. Good observation. When the entire table is in cache, pgstattuple about 4 times faster than count(*) on my machine. When the table is too large to fit into cache, the performance difference drops to 8% in favour of pgstattuple: nconway=# select count(*) from big_table; count - 8388612 (1 row) Time: 26769.99 ms nconway=# SELECT tuple_count FROM pgstattuple('big_table'); tuple_count - 8388612 (1 row) Time: 24658.87 ms Cheers, Neil -- Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster