Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-11-03 Thread Jasen Betts
On 2010-10-26, John R Pierce pie...@hogranch.com wrote:

 count(*) has to read the whole table to get the accurate count.   The 
 reason for this is that different clients can see different versions of 
 that table, for instance, if client A is already in a transaction, and 
 client B then does an INSERT, the two clients will see different values 
 for the count.

They may or may not. the default transaction isolation level read commited
allows a session to see most changes that were committed externally after the
start of the transaction.

Tlso the index may include deleted rows.
which is another reason count(*) does a table scan.

-- 
ɹǝpun uʍop ɯoɹɟ sƃuıʇǝǝɹ⅁


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-28 Thread Ozz Nixon
How/where do I query this?

My script does not need a 100% accurate count - just a recently valid count - 
so I can verify the web crawlers are still crawling :-)

On Oct 27, 2010, at 7:15 AM, Merlin Moncure wrote:

 pg_class.reltuples


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-28 Thread Merlin Moncure
On Thu, Oct 28, 2010 at 11:36 AM, Ozz Nixon ozzni...@gmail.com wrote:
 How/where do I query this?

 My script does not need a 100% accurate count - just a recently valid count - 
 so I can verify the web crawlers are still crawling :-)

you can do this:
select reltuples from pg_class where relname = 'your_table' and relkind = 'r';

that will give you accurate count as of the last analyze, which is
going to be driven by table usage and/or manual analyze.   Probably
much better in your particular case is to do this:

select * from pg_stat_all_tables where relname = 'your_table';

and look at the n_tup_ins, del, etc.  and make sure they are changing
(those numbers are reset when server resets, fyi).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-27 Thread Merlin Moncure
On Tue, Oct 26, 2010 at 4:30 PM, Diego Schulz dsch...@gmail.com wrote:
 On Tue, Oct 26, 2010 at 2:18 PM, Ozz Nixon ozzni...@gmail.com wrote:
 I am the only user on this system right now, and one table select count(*) 
 took over 20 minutes:

 wikitags exists and has 58,988,656 records.

 Structure (in pascal) is:

   quer.SQL.Add('create table '+DBTags+' (');
   quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
   quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
   quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
   quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
   quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
   quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
   quer.SQL.Add('   instances '+SQL_INT32+' not null,');
   if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
   quer.SQL.Add('   primary key(pagename, tagword, instances)');
   quer.SQL.Add(')');

 where SQL_TITLE = 'varchar', SQL_IN32 = 'int'

 I have hung off indexes for each column, to resolve my previous 
 performance issue from 3+ weeks ago. However, COUNT() is still dog slow - 
 this table is a write once, read many... *never* update, nor delete.

 Any suggestions?
 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


 You could try something like what's suggested in this blog post:
 http://jakub.fedyczak.net/post/26

 I didn't actually tried it, but I think it should work ok.

Before you try that, you should ask yourself if you really need a 100%
accurate count.  A reasonable approximation is maintained via the
stats system (pg_class.reltuples) that will often do and is free.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-26 Thread Ozz Nixon
I am the only user on this system right now, and one table select count(*) took 
over 20 minutes:

wikitags exists and has 58,988,656 records.

Structure (in pascal) is:

   quer.SQL.Add('create table '+DBTags+' (');
   quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
   quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
   quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
   quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
   quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
   quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
   quer.SQL.Add('   instances '+SQL_INT32+' not null,');
   if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
   quer.SQL.Add('   primary key(pagename, tagword, instances)');
   quer.SQL.Add(')');

where SQL_TITLE = 'varchar', SQL_IN32 = 'int'

I have hung off indexes for each column, to resolve my previous performance 
issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a 
write once, read many... *never* update, nor delete.

Any suggestions?
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-26 Thread Bill Moran
In response to Ozz Nixon ozzni...@gmail.com:

 I am the only user on this system right now, and one table select count(*) 
 took over 20 minutes:
 
 wikitags exists and has 58,988,656 records.
 
 Structure (in pascal) is:
 
quer.SQL.Add('create table '+DBTags+' (');
quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
quer.SQL.Add('   instances '+SQL_INT32+' not null,');
if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
quer.SQL.Add('   primary key(pagename, tagword, instances)');
quer.SQL.Add(')');
 
 where SQL_TITLE = 'varchar', SQL_IN32 = 'int'
 
 I have hung off indexes for each column, to resolve my previous performance 
 issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a 
 write once, read many... *never* update, nor delete.
 
 Any suggestions?

Generate the count one time and store it somewhere for quick retrieval.

In an MVCC database, count(*) is designed to be accurate, which requires
a scan of the entire table (which appears to take about 20 mins on your
hardware).

MVCC just isn't optimized for a table that never changes.  However, it's
easy to cache that value, since it never changes the cache never needs
to be updated.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-26 Thread John R Pierce

On 10/26/10 10:18 AM, Ozz Nixon wrote:

I am the only user on this system right now, and one table select count(*) took 
over 20 minutes:

wikitags exists and has 58,988,656 records.

Structure (in pascal) is:

quer.SQL.Add('create table '+DBTags+' (');
quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
quer.SQL.Add('   instances '+SQL_INT32+' not null,');
if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
quer.SQL.Add('   primary key(pagename, tagword, instances)');
quer.SQL.Add(')');

where SQL_TITLE = 'varchar', SQL_IN32 = 'int'

I have hung off indexes for each column, to resolve my previous performance 
issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, 
read many... *never* update, nor delete.


count(*) has to read the whole table to get the accurate count.   The 
reason for this is that different clients can see different versions of 
that table, for instance, if client A is already in a transaction, and 
client B then does an INSERT, the two clients will see different values 
for the count.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-26 Thread Alan Hodgson
On October 26, 2010 10:18:41 am Ozz Nixon wrote:
 I have hung off indexes for each column, to resolve my previous
 performance issue from 3+ weeks ago. However, COUNT() is still dog slow
 - this table is a write once, read many... *never* update, nor delete.
 
 Any suggestions?

If you need to do count(*) on 60 million row tables, you will probably need 
faster hardware.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-26 Thread Diego Schulz
On Tue, Oct 26, 2010 at 2:18 PM, Ozz Nixon ozzni...@gmail.com wrote:
 I am the only user on this system right now, and one table select count(*) 
 took over 20 minutes:

 wikitags exists and has 58,988,656 records.

 Structure (in pascal) is:

   quer.SQL.Add('create table '+DBTags+' (');
   quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
   quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
   quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
   quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
   quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
   quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
   quer.SQL.Add('   instances '+SQL_INT32+' not null,');
   if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
   quer.SQL.Add('   primary key(pagename, tagword, instances)');
   quer.SQL.Add(')');

 where SQL_TITLE = 'varchar', SQL_IN32 = 'int'

 I have hung off indexes for each column, to resolve my previous performance 
 issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a 
 write once, read many... *never* update, nor delete.

 Any suggestions?
 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


You could try something like what's suggested in this blog post:
http://jakub.fedyczak.net/post/26

I didn't actually tried it, but I think it should work ok.

cheers,

diego

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general