[PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread tom
Hi,

=== Problem ===

i have a db-table data_measurand with about 6000 (60 Millions)
rows and the following query takes about 20-30 seconds (with psql):

mydb=# select count(*) from data_measurand;
  count   
--
 60846187
(1 row)


=== Question ===

- What can i do to improve the performance for the data_measurand table?
 
=== Background ===

I created a application with django 1.1 ( http://djangoproject.com ) to
collect, analyze and visualize measurement data.

=== My System ===

= Postgres Version =
postgres=# select version();
version
-
 PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real
(Debian 4.3.2-1.1) 4.3.2
(1 row)

I installed postgres with apt-get from debian lenny without any
modifications.

= Debian Lenny Kernel Version =
lenny:~# uname  -a
Linux or.ammonit.com 2.6.26-2-686-bigmem #1 SMP Wed Nov 4 21:12:12 UTC
2009 i686 GNU/Linux

= Hardware = 
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
cpu MHz : 1000.000
cache size  : 512 KB
MemTotal: 8281516 kB (8 GB)

I use a software raid and LVM for Logical Volume Management. Filesystem
is ext3



=== My Table Definitions ===

mydb=# \d data_measurand;
 Table public.data_measurand
 Column  |  Type  |
Modifiers  
-++-
 id  | integer| not null default
nextval('data_measurand_id_seq'::regclass)
 entry_id| integer| not null
 sensor_id   | integer| not null
 avg_value   | numeric(10,4)  | 
 avg_count_value | integer| 
 min_value   | numeric(10,4)  | 
 max_value   | numeric(10,4)  | 
 sigma_value | numeric(10,4)  | 
 unit| character varying(20)  | not null
 status  | integer| not null
 comment | character varying(255) | not null
Indexes:
data_measurand_pkey PRIMARY KEY, btree (id)
data_measurand_entry_id_68e2e3fe UNIQUE, btree (entry_id,
sensor_id)
data_measurand_avg_count_value btree (avg_count_value)
data_measurand_avg_value btree (avg_value)
data_measurand_comment btree (comment)
data_measurand_entry_id btree (entry_id)
data_measurand_max_value btree (max_value)
data_measurand_min_value btree (min_value)
data_measurand_sensor_id btree (sensor_id)
data_measurand_sigma_value btree (sigma_value)
data_measurand_status btree (status)
data_measurand_unit btree (unit)
Foreign-key constraints:
entry_id_refs_id_50fa9bdf FOREIGN KEY (entry_id) REFERENCES
data_entry(id) DEFERRABLE INITIALLY DEFERRED
sensor_id_refs_id_5ed84c7c FOREIGN KEY (sensor_id) REFERENCES
sensor_sensor(id) DEFERRABLE INITIALLY DEFERRED



mydb=# \d data_entry;
   Table public.data_entry
  Column  |   Type   |
Modifiers
--+--+-
 id   | integer  | not null default
nextval('data_entry_id_seq'::regclass)
 project_id   | integer  | not null
 logger_id| integer  | not null
 original_file_id | integer  | not null
 datetime | timestamp with time zone | not null
Indexes:
data_entry_pkey PRIMARY KEY, btree (id)
data_entry_logger_id_197f5d41 UNIQUE, btree (logger_id, datetime)
data_entry_datetime btree (datetime)
data_entry_logger_id btree (logger_id)
data_entry_original_file_id btree (original_file_id)
data_entry_project_id btree (project_id)
Foreign-key constraints:
logger_id_refs_id_5f73cf46 FOREIGN KEY (logger_id) REFERENCES
logger_logger(id) DEFERRABLE INITIALLY DEFERRED
original_file_id_refs_id_44e8d3b1 FOREIGN KEY (original_file_id)
REFERENCES data_originalfile(id) DEFERRABLE INITIALLY DEFERRED
project_id_refs_id_719fb302 FOREIGN KEY (project_id) REFERENCES
project_project(id) DEFERRABLE INITIALLY DEFERRED



mydb=# \d project_project;
Table public.project_project
Column |  Type  |
Modifiers   
---++--
 id| integer| not null default
nextval('project_project_id_seq'::regclass)
 auth_group_id | integer| not null
 name  | character varying(200) | not null
 timezone  | character varying(200) | 
 longitude | double precision   | 
 latitude  | double precision   | 
 altitude  | double precision   | 
 comment   | text   | 
Indexes:

Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Matthew Wakeling

On Thu, 14 Jan 2010, tom wrote:

i have a db-table data_measurand with about 6000 (60 Millions)
rows and the following query takes about 20-30 seconds (with psql):

mydb=# select count(*) from data_measurand;
 count
--
60846187
(1 row)


Sounds pretty reasonable to me. Looking at your table, the rows are maybe 
200 bytes wide? That's 12GB of data for Postgres to munch through. 30 
seconds is really rather quick for that (400MB/s). What sort of RAID array 
is managing to give you that much?



I use a software raid and LVM for Logical Volume Management. Filesystem
is ext3


Ditch lvm.


This is an FAQ. Counting the rows in a table is an expensive operation in 
Postgres. It can't be answered directly from an index. If you want, you 
can keep track of the number of rows yourself with triggers, but beware 
that this will slow down write access to the table.


Matthew

--
Nog: Look! They've made me into an ensign!
O'Brien: I didn't know things were going so badly.
Nog: Frightening, isn't it?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread A. Kretschmer
In response to tom :
 Hi,
 
 === Problem ===
 
 i have a db-table data_measurand with about 6000 (60 Millions)
 rows and the following query takes about 20-30 seconds (with psql):
 
 mydb=# select count(*) from data_measurand;
   count   
 --
  60846187
 (1 row)
 
 
 === Question ===
 
 - What can i do to improve the performance for the data_measurand table?

Short answer: nothing.

Long answer: PG has to check the visibility for each record, so it
forces a seq.scan.

But you can get an estimation, ask pg_class (a system table), the column
reltuples there contains an estimated row rount.
http://www.postgresql.org/docs/current/static/catalog-pg-class.html

If you really needs the correct row-count you should create a TRIGGER
and count with this trigger all INSERTs and DELETEs.


Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Matthew Wakeling matt...@flymine.org wrote:
 
 This is an FAQ.
 
I just added it to the wiki FAQ page:
 
http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Ivan Voras

Kevin Grittner wrote:

Matthew Wakeling matt...@flymine.org wrote:
 

This is an FAQ.
 
I just added it to the wiki FAQ page:
 
http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F


Maybe you could add a short note why an estimation like from the 
pg_class table is usually enough.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Ivan Voras ivo...@freebsd.org wrote:
 
 Maybe you could add a short note why an estimation like from the 
 pg_class table is usually enough.
 
OK.  Will do.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Greg Smith

Kevin Grittner wrote:

Matthew Wakeling matt...@flymine.org wrote:
 
  

This is an FAQ.

 
I just added it to the wiki FAQ page:
 
http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F
  
The content was already there, just not linked into the main FAQ yet: 
http://wiki.postgresql.org/wiki/Slow_Counting


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 The content was already there, just not linked into the main FAQ
 yet: 
 http://wiki.postgresql.org/wiki/Slow_Counting
 
For a question asked this frequently, it should probably be in the
FAQ.  I'll add a link from there to the more thorough write-up.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Greg Smith

Kevin Grittner wrote:

Greg Smith g...@2ndquadrant.com wrote:
 
  

The content was already there, just not linked into the main FAQ
yet: 
http://wiki.postgresql.org/wiki/Slow_Counting

 
For a question asked this frequently, it should probably be in the

FAQ.  I'll add a link from there to the more thorough write-up.
 


There's a whole list of FAQs that are documented on the wiki but not in 
the main FAQ yet leftover from before the main FAQ was hosted there. You 
can see them all at 
http://wiki.postgresql.org/wiki/Frequently_Asked_Questions


I just haven't had time to merge those all usefully into the main FAQ.

--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 There's a whole list of FAQs that are documented on the wiki but
 not in the main FAQ yet leftover from before the main FAQ was
 hosted there. You can see them all at 
 http://wiki.postgresql.org/wiki/Frequently_Asked_Questions
 
 I just haven't had time to merge those all usefully into the main
 FAQ.
 
Well, unless you object to the way I did it, there's one down. 
Should I remove it from the list of Other FAQs on the page you
cite?
 
(Of course, it goes without saying that you're welcome to improve
upon anything I put in there.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Greg Smith

Kevin Grittner wrote:

Greg Smith g...@2ndquadrant.com wrote:
 
  

There's a whole list of FAQs that are documented on the wiki but
not in the main FAQ yet leftover from before the main FAQ was
hosted there. You can see them all at 
http://wiki.postgresql.org/wiki/Frequently_Asked_Questions


I just haven't had time to merge those all usefully into the main
FAQ.

 
Well, unless you object to the way I did it, there's one down. 
Should I remove it from the list of Other FAQs on the page you

cite?
 


Sure; everyone should feel free to assimilate into the main FAQ and wipe 
out anything on that smaller list. Those are mainly topics where the 
discussion of workarounds and approaches can be much longer than 
standard FAQ length, so I suspect many of the answers are going to be a 
very brief summary with a link to longer discussion. If you come across 
a really small one, we might even wipe out the original page once it's 
merged in.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com