Re: RES: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1)
Please Have a look on pg_stat_user_tables, there is a field called n_live_tup . But I think in some cases this might not hold the actual row number Regards From: Carlos Sotto Maior (SIM) To: David Johnston ; pgsql-general@postgresql.org Sent: Mon, May 30, 2011 8:35:39 PM Subject: RES: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1) David, Thanks for your reply. I will probably use the strategy of a trigger driven counter, with temporal strategy devising current month totals and up to last month total as current month changes rapidly. I also apologize for not being investigative enough. I did look at wiki but maybe I did not pursue my doubt in wiki in the proper manner. I will surely get more acquainted to wiki to keep this channel clean from repeated questions. Once again, thank you very much. Carlos Sotto Maior +55 11 8244-7899 cso...@sistemassim.com.br Sistemas Sim Serviços e Tecnologia Ltda. +55 11 5041-3086 Rua Tenente Gomes Ribeiro, 78 Vila Clementino (Próximo ao Metro Santa Cruz) São Paulo - SP 04038-040 -Mensagem original- De: David Johnston [mailto:pol...@yahoo.com] Enviada em: sexta-feira, 27 de maio de 2011 17:49 Para: 'Carlos Sotto Maior (SIM)'; pgsql-general@postgresql.org Assunto: RE: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1) Counting live data is inherently imprecise. There are supposedly some system tables that can give you rough numbers. You would be better off figuring out an alternative method to get the data you desire and stop continually recounting all 5.7M records. A Trigger driven counter, for insert and delete, is probably the most obvious method. Also, say for temporal data, cache the prior monthly counts and only perform an actual count over the current (changing) month(s). At your table size the brute-force approach is obviously not going to work so an alternative method needs to be devised, one that eliminates re-counting previously counted records. The specific design is going to be highly dependent on your specific requirements - which is why no generalized solution exists. If you provide the why behind the question, and not just the question, people may be inclined to provide relevant suggestions. Issuing a "count(*)" is not a need - it is an implementation. The need is what you end up doing with that number. Lastly, the time you spent combing the system catalogs would have been better spent perusing the FAQ linked to off the PostgreSQL homepage. You question, in almost the same words, is in the FAQ with a link to the wiki which repeats all your observations and explains why the behavior is that way; and suggests (links to) possible alternatives. You may wish to go there now to get more background and ideas. David J. > Hi, > > My application has a frequent need to issue a select count(*) on tables. > Some have a large row count. (The example below are from a 5.7 M row; > Some are larger). > > Issuing either SELECT COUNT(*) or SELECT COUNT() > yelds a sequential scan on table; > > I have browsed catalog tables, digging for a real time Row.count but > so far > did not find any. > QUESTION: Is there a better (faster) way to obtain the row count from > a table? > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
RES: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1)
David, Thanks for your reply. I will probably use the strategy of a trigger driven counter, with temporal strategy devising current month totals and up to last month total as current month changes rapidly. I also apologize for not being investigative enough. I did look at wiki but maybe I did not pursue my doubt in wiki in the proper manner. I will surely get more acquainted to wiki to keep this channel clean from repeated questions. Once again, thank you very much. Carlos Sotto Maior +55 11 8244-7899 cso...@sistemassim.com.br Sistemas Sim Serviços e Tecnologia Ltda. +55 11 5041-3086 Rua Tenente Gomes Ribeiro, 78 Vila Clementino (Próximo ao Metro Santa Cruz) São Paulo - SP 04038-040 -Mensagem original- De: David Johnston [mailto:pol...@yahoo.com] Enviada em: sexta-feira, 27 de maio de 2011 17:49 Para: 'Carlos Sotto Maior (SIM)'; pgsql-general@postgresql.org Assunto: RE: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1) Counting live data is inherently imprecise. There are supposedly some system tables that can give you rough numbers. You would be better off figuring out an alternative method to get the data you desire and stop continually recounting all 5.7M records. A Trigger driven counter, for insert and delete, is probably the most obvious method. Also, say for temporal data, cache the prior monthly counts and only perform an actual count over the current (changing) month(s). At your table size the brute-force approach is obviously not going to work so an alternative method needs to be devised, one that eliminates re-counting previously counted records. The specific design is going to be highly dependent on your specific requirements - which is why no generalized solution exists. If you provide the why behind the question, and not just the question, people may be inclined to provide relevant suggestions. Issuing a "count(*)" is not a need - it is an implementation. The need is what you end up doing with that number. Lastly, the time you spent combing the system catalogs would have been better spent perusing the FAQ linked to off the PostgreSQL homepage. You question, in almost the same words, is in the FAQ with a link to the wiki which repeats all your observations and explains why the behavior is that way; and suggests (links to) possible alternatives. You may wish to go there now to get more background and ideas. David J. > Hi, > > My application has a frequent need to issue a select count(*) on tables. > Some have a large row count. (The example below are from a 5.7 M row; > Some are larger). > > Issuing either SELECT COUNT(*) or SELECT COUNT() > yelds a sequential scan on table; > > I have browsed catalog tables, digging for a real time Row.count but > so far > did not find any. > QUESTION: Is there a better (faster) way to obtain the row count from > a table? > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1)
On 05/27/2011 12:41 PM, Carlos Sotto Maior (SIM) wrote: I have browsed catalog tables, digging for a real time Row.count but so far did not find any. See http://wiki.postgresql.org/wiki/Slow_Counting which shows you where the one system count estimate is at, as well as suggesting links to where you can find alternate approaches here. If you need an exact count and can't afford to generate a full query to find one, some sort of trigger-based approach is likely where you'll need to go. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1)
Counting live data is inherently imprecise. There are supposedly some system tables that can give you rough numbers. You would be better off figuring out an alternative method to get the data you desire and stop continually recounting all 5.7M records. A Trigger driven counter, for insert and delete, is probably the most obvious method. Also, say for temporal data, cache the prior monthly counts and only perform an actual count over the current (changing) month(s). At your table size the brute-force approach is obviously not going to work so an alternative method needs to be devised, one that eliminates re-counting previously counted records. The specific design is going to be highly dependent on your specific requirements - which is why no generalized solution exists. If you provide the why behind the question, and not just the question, people may be inclined to provide relevant suggestions. Issuing a "count(*)" is not a need - it is an implementation. The need is what you end up doing with that number. Lastly, the time you spent combing the system catalogs would have been better spent perusing the FAQ linked to off the PostgreSQL homepage. You question, in almost the same words, is in the FAQ with a link to the wiki which repeats all your observations and explains why the behavior is that way; and suggests (links to) possible alternatives. You may wish to go there now to get more background and ideas. David J. > Hi, > > My application has a frequent need to issue a select count(*) on tables. > Some have a large row count. (The example below are from a 5.7 M row; > Some are larger). > > Issuing either SELECT COUNT(*) or SELECT COUNT() > yelds a sequential scan on table; > > I have browsed catalog tables, digging for a real time Row.count but so far > did not find any. > QUESTION: Is there a better (faster) way to obtain the row count from a > table? > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1)
Hi, My application has a frequent need to issue a select count(*) on tables. Some have a large row count. (The example below are from a 5.7 M row; Some are larger). Issuing either SELECT COUNT(*) or SELECT COUNT() yelds a sequential scan on table; I have browsed catalog tables, digging for a real time Row.count but so far did not find any. QUESTION: Is there a better (faster) way to obtain the row count from a table? Table has been VACCUM ANALYZED prior EXPLAIN ANALYZE (Numbers are from a test server) explain analyze select count(*) as qtd from ut_mailing_client ; "Aggregate (cost=1231424.23..1231424.24 rows=1 width=0) (actual time=7205.009..7205.010 rows=1 loops=1)" " -> Seq Scan on ut_mailing_client (cost=100.00..1217054.58 rows=5747858 width=0) (actual time=0.034..3866.765 rows=5747311 loops=1)" "Total runtime: 7205.071 ms" explain analyze select count(utm_id) as qtd from ut_mailing_client ; "Aggregate (cost=1231424.23..1231424.24 rows=1 width=4) (actual time=7984.382..7984.382 rows=1 loops=1)" " -> Seq Scan on ut_mailing_client (cost=100.00..1217054.58 rows=5747858 width=4) (actual time=0.028..4114.177 rows=5747311 loops=1)" "Total runtime: 7984.443 ms" -- explain analyze select count(beneficio) as qtd from ut_mailing_client ; "Aggregate (cost=1231424.23..1231424.24 rows=1 width=11) (actual time=7591.530..7591.530 rows=1 loops=1)" " -> Seq Scan on ut_mailing_client (cost=100.00..1217054.58 rows=5747858 width=11) (actual time=0.032..3845.412 rows=5747311 loops=1)" "Total runtime: 7591.595 ms" --TABLE STRUCTURE--- - CREATE TABLE ut_mailing_client ( utm_id serial NOT NULL, beneficio character varying(10) NOT NULL, . . . CONSTRAINT ut_mailing_client_pkey PRIMARY KEY (beneficio), CONSTRAINT ut_mailing_client_utm_id_key UNIQUE (utm_id) ) WITH ( OIDS=FALSE ); -VACUM ANALYZE- - INFO: vacuuming "public.ut_mailing_client" INFO: index "ut_mailing_client_pkey" now contains 5747311 row versions in 18926 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 6.50s/3.24u sec elapsed 39.03 sec. INFO: index "ut_mailing_client_utm_id_key" now contains 5747311 row versions in 12615 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 4.28s/2.19u sec elapsed 26.05 sec. INFO: index "ut_mailing_client_utm_fk_lote_utm_dt_used_idx" now contains 5747311 row versions in 18926 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 6.39s/3.27u sec elapsed 38.90 sec. INFO: "ut_mailing_client": found 0 removable, 1179 nonremovable row versions in 31 out of 159576 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 17.17s/8.71u sec elapsed 104.02 sec. INFO: vacuuming "pg_toast.pg_toast_69799" INFO: index "pg_toast_69799_index" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_69799": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.ut_mailing_client" INFO: "ut_mailing_client": scanned 3 of 159576 pages, containing 1080857 live rows and 0 dead rows; 3 rows in sample, 5749295 estimated total rows Total query runtime: 111560 ms. Carlos Sotto Maior +55 11 8244-7899 cso...@sistemassim.com.br Sistemas Sim Serviços e Tecnologia Ltda. +55 11 5041-3086 Rua Tenente Gomes Ribeiro, 78 Vila Clementino (Próximo ao Metro Santa Cruz) São Paulo - SP 04038-040 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general