Gurus,
A table in one of my databases has just crossed the 30 million row
mark and has begun to feel very sluggish for just about anything I do
with it. I keep the entire database vacuumed regularly. And, as
long as I'm not doing a sequential scan, things seem reasonably quick
most of the time. I'm now thinking that my problem is IO because
anything that involves heavy ( like a seq scan ) IO seems to slow to
a crawl. Even if I am using indexed fields to grab a few thousand
rows, then going to sequential scans it gets very very slow.
I have also had the occurrence where queries will not finish for days
( I eventually have to kill them ). I was hoping to provide an
explain analyze for them, but if they never finish... even the
explain never finishes when I try that.
For example, as I'm writing this, I am running an UPDATE statement
that will affect a small part of the table, and is querying on an
indexed boolean field.
I have been waiting for over an hour and a half as I write this and
it still hasn't finished. I'm thinking "I bet Tom, Simon or Josh
wouldn't put up with this kind of wait time..", so I thought I would
see if anyone here had some pointers. Maybe I have a really stupid
setting in my conf file that is causing this. I really can't believe
I am at the limits of this hardware, however.
The query:
update eventactivity set ftindex = false where ftindex = true;
( added the where clause because I don't want to alter where ftindex
is null )
The table:
Column | Type | Modifiers
-------------+-----------------------------+-----------
entrydate | timestamp without time zone |
incidentid | character varying(40) |
statustype | character varying(20) |
unitid | character varying(20) |
recordtext | character varying(255) |
recordtext2 | character varying(255) |
insertdate | timestamp without time zone |
ftindex | boolean |
Indexes: eventactivity1 btree (incidentid),
eventactivity_entrydate_idx btree (entrydate),
eventactivity_ftindex_idx btree (ftindex),
eventactivity_oid_idx btree (oid)
The hardware:
4 x 2.2GHz Opterons
12 GB of RAM
4x10k 73GB Ultra320 SCSI drives in RAID 0+1
1GB hardware cache memory on the RAID controller
The OS:
Fedora, kernel 2.6.6-1.435.2.3smp ( redhat stock kernel )
filesystem is mounted as ext2
#####
vmstat output ( as I am waiting for this to finish ):
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 1 5436 2823908 26140 9183704 0 1 2211 540 694 336
9 2 76 13
#####
iostat output ( as I am waiting for this to finish ):
avg-cpu: %user %nice %sys %iowait %idle
9.19 0.00 2.19 13.08 75.53
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 329.26 17686.03 4317.57 161788630 39496378
#####
This is a dedicated postgresql server, so maybe some of these
settings are more liberal than they should be?
relevant ( I hope ) postgresql.conf options are:
shared_buffers = 50000
effective_cache_size = 1348000
random_page_cost = 3
work_mem = 512000
max_fsm_pages = 80000
log_min_duration_statement = 60000
fsync = true ( not sure if I'm daring enough to run without this )
wal_buffers = 1000
checkpoint_segments = 64
checkpoint_timeout = 3000
#---- FOR PG_AUTOVACUUM --#
stats_command_string = true
stats_row_level = true
Thanks in advance,
Dan
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster