Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
On 2010-10-26, John R Pierce pie...@hogranch.com wrote: count(*) has to read the whole table to get the accurate count. The reason for this is that different clients can see different versions of that table, for instance, if client A is already in a transaction, and client B then does an INSERT, the two clients will see different values for the count. They may or may not. the default transaction isolation level read commited allows a session to see most changes that were committed externally after the start of the transaction. Tlso the index may include deleted rows. which is another reason count(*) does a table scan. -- ɹǝpun uʍop ɯoɹɟ sƃuıʇǝǝɹ⅁ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
How/where do I query this? My script does not need a 100% accurate count - just a recently valid count - so I can verify the web crawlers are still crawling :-) On Oct 27, 2010, at 7:15 AM, Merlin Moncure wrote: pg_class.reltuples -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
On Thu, Oct 28, 2010 at 11:36 AM, Ozz Nixon ozzni...@gmail.com wrote: How/where do I query this? My script does not need a 100% accurate count - just a recently valid count - so I can verify the web crawlers are still crawling :-) you can do this: select reltuples from pg_class where relname = 'your_table' and relkind = 'r'; that will give you accurate count as of the last analyze, which is going to be driven by table usage and/or manual analyze. Probably much better in your particular case is to do this: select * from pg_stat_all_tables where relname = 'your_table'; and look at the n_tup_ins, del, etc. and make sure they are changing (those numbers are reset when server resets, fyi). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
On Tue, Oct 26, 2010 at 4:30 PM, Diego Schulz dsch...@gmail.com wrote: On Tue, Oct 26, 2010 at 2:18 PM, Ozz Nixon ozzni...@gmail.com wrote: I am the only user on this system right now, and one table select count(*) took over 20 minutes: wikitags exists and has 58,988,656 records. Structure (in pascal) is: quer.SQL.Add('create table '+DBTags+' ('); quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,'); quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,'); quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' instances '+SQL_INT32+' not null,'); if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK'); quer.SQL.Add(' primary key(pagename, tagword, instances)'); quer.SQL.Add(')'); where SQL_TITLE = 'varchar', SQL_IN32 = 'int' I have hung off indexes for each column, to resolve my previous performance issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete. Any suggestions? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general You could try something like what's suggested in this blog post: http://jakub.fedyczak.net/post/26 I didn't actually tried it, but I think it should work ok. Before you try that, you should ask yourself if you really need a 100% accurate count. A reasonable approximation is maintained via the stats system (pg_class.reltuples) that will often do and is free. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why Select Count(*) from table - took over 20 minutes?
I am the only user on this system right now, and one table select count(*) took over 20 minutes: wikitags exists and has 58,988,656 records. Structure (in pascal) is: quer.SQL.Add('create table '+DBTags+' ('); quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,'); quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,'); quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' instances '+SQL_INT32+' not null,'); if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK'); quer.SQL.Add(' primary key(pagename, tagword, instances)'); quer.SQL.Add(')'); where SQL_TITLE = 'varchar', SQL_IN32 = 'int' I have hung off indexes for each column, to resolve my previous performance issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete. Any suggestions? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
In response to Ozz Nixon ozzni...@gmail.com: I am the only user on this system right now, and one table select count(*) took over 20 minutes: wikitags exists and has 58,988,656 records. Structure (in pascal) is: quer.SQL.Add('create table '+DBTags+' ('); quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,'); quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,'); quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' instances '+SQL_INT32+' not null,'); if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK'); quer.SQL.Add(' primary key(pagename, tagword, instances)'); quer.SQL.Add(')'); where SQL_TITLE = 'varchar', SQL_IN32 = 'int' I have hung off indexes for each column, to resolve my previous performance issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete. Any suggestions? Generate the count one time and store it somewhere for quick retrieval. In an MVCC database, count(*) is designed to be accurate, which requires a scan of the entire table (which appears to take about 20 mins on your hardware). MVCC just isn't optimized for a table that never changes. However, it's easy to cache that value, since it never changes the cache never needs to be updated. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
On 10/26/10 10:18 AM, Ozz Nixon wrote: I am the only user on this system right now, and one table select count(*) took over 20 minutes: wikitags exists and has 58,988,656 records. Structure (in pascal) is: quer.SQL.Add('create table '+DBTags+' ('); quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,'); quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,'); quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' instances '+SQL_INT32+' not null,'); if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK'); quer.SQL.Add(' primary key(pagename, tagword, instances)'); quer.SQL.Add(')'); where SQL_TITLE = 'varchar', SQL_IN32 = 'int' I have hung off indexes for each column, to resolve my previous performance issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete. count(*) has to read the whole table to get the accurate count. The reason for this is that different clients can see different versions of that table, for instance, if client A is already in a transaction, and client B then does an INSERT, the two clients will see different values for the count. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
On October 26, 2010 10:18:41 am Ozz Nixon wrote: I have hung off indexes for each column, to resolve my previous performance issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete. Any suggestions? If you need to do count(*) on 60 million row tables, you will probably need faster hardware. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?
On Tue, Oct 26, 2010 at 2:18 PM, Ozz Nixon ozzni...@gmail.com wrote: I am the only user on this system right now, and one table select count(*) took over 20 minutes: wikitags exists and has 58,988,656 records. Structure (in pascal) is: quer.SQL.Add('create table '+DBTags+' ('); quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,'); quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,'); quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,'); quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,'); quer.SQL.Add(' instances '+SQL_INT32+' not null,'); if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK'); quer.SQL.Add(' primary key(pagename, tagword, instances)'); quer.SQL.Add(')'); where SQL_TITLE = 'varchar', SQL_IN32 = 'int' I have hung off indexes for each column, to resolve my previous performance issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete. Any suggestions? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general You could try something like what's suggested in this blog post: http://jakub.fedyczak.net/post/26 I didn't actually tried it, but I think it should work ok. cheers, diego -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general