Hi We are experiencing consistent slowness on the database for
one application. This is more a reporting type of application, heavy on the
bytea data type usage (gets rendered into PDFs in the app server). A lot of
queries, mostly selects and a few random updates, get accumulated on the server
– with increasing volume of users on the application. Below is a snapshot
of top, with about 80 selects and 3 or 4 updates. Things get better eventually if
I cancel (SIGINT) some of the oldest queries. I also see a few instances of
shared locks not being granted during this time…I don’t even see high
iowait or memory starvation during these times, as indicated by top. -bash-2.05b$ psql -c "select * from pg_locks;" dbname
| grep f | | 77922136 | 16761 |
ShareLock | f We (development) are looking into the query optimization
(explain analyze, indexes, etc), and my understanding is that the queries when
run for explain analyze execute fast, but during busy times, they become quite
slow, taking from a few seconds to a few minutes to execute. I do see in the
log that almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT.
Does it hurt to up the sort_mem to 3MB or 4MB? Should I up the
effective_cache_size to 5 or 6GB? The app is does not need a lot of connections
on the database, I can reduce it down from 600. Based on the description above and the configuration below
does any thing appear bad in config? Is there anything I can try in the
configuration to improve performance? The database size is about 4GB. This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10
(15KRPM), and logs on a separate set of drives, RAID10. 6650 server, 4 x XEON,
12GB RAM. Vacuum is done every night, full vacuum done once a week. I had increased the shared_buffers and sort_memory recently,
which didn’t help. Thanks, 10:44:51 up 14 days, 13:38, 2 users, load average: 0.98,
1.14, 1.12 264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq
iowait idle total 14.4% 0.0% 7.4% 0.0% 0.0%
0.0% 77.9% cpu00 15.7% 0.0% 5.7% 0.0% 0.1%
0.0% 78.2% cpu01 15.1% 0.0% 7.5% 0.0% 0.0%
0.1% 77.0% cpu02 10.5% 0.0% 5.9% 0.0% 0.0%
0.0% 83.4% cpu03 9.9% 0.0% 5.9% 0.0% 0.0%
0.0% 84.0% cpu04 7.9% 0.0% 3.7% 0.0% 0.0%
0.0% 88.2% cpu05 19.3% 0.0% 12.3% 0.0% 0.0%
0.0% 68.3% cpu06 20.5% 0.0% 9.5% 0.0% 0.0%
0.1% 69.7% cpu07 16.1% 0.0% 8.5% 0.0% 0.1%
0.3% 74.7% Mem: 12081736k av, 7881972k used, 4199764k free, 0k
shrd, 82372k buff 4823496k actv, 2066260k in_d, 2036k
in_c Swap: 4096532k av, 0k used, 4096532k
free 6888900k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM
TIME CPU COMMAND 16773 postgres 15 0 245M 245M 240M S 0.0 2.0
1:16 7 postmaster 16880 postgres 15 0 245M 245M 240M S 0.1 2.0
0:49 6 postmaster 16765 postgres 15 0 245M 245M 240M S 0.0 2.0
1:16 0 postmaster 16825 postgres 15 0 245M 245M 240M S 0.0 2.0
1:02 5 postmaster 16774 postgres 15 0 245M 245M 240M S 0.1 2.0
1:16 0 postmaster 16748 postgres 15 0 245M 245M 240M S 0.0 2.0
1:19 5 postmaster 16881 postgres 15 0 245M 245M 240M S 0.1 2.0
0:50 7 postmaster 16762 postgres 15 0 245M 245M 240M S 0.0 2.0
1:14 4 postmaster … … max_connections = 600 shared_buffers = 30000 #=234MB, up from 21760=170MB min 16,
at least max_connections*2, 8KB each sort_mem = 2048 # min 64, size in KB vacuum_mem = 32768 # up from 16384 min 1024,
size in KB # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16,
6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each #fsync = true # turns forced
synchronization on or off #wal_sync_method = fsync # the default varies across
platforms: # fsync, fdatasync,
open_sync, or open_datasync #wal_buffers = 8 # min 4, 8KB each # - Checkpoints - checkpoint_segments = 125 # in logfile segments, min
1, 16MB each checkpoint_timeout = 600 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-100000, in
microseconds #commit_siblings = 5 # range 1-1000 # - Planner Method Enabling - #enable_hashagg = true #enable_hashjoin = true #enable_indexscan = true #enable_mergejoin = true #enable_nestloop = true #enable_seqscan = true #enable_sort = true #enable_tidscan = true # - Planner Cost Constants - effective_cache_size = 262144 # =2GB typically 8KB each #random_page_cost = 4 # units are one sequential
page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = true #geqo_threshold = 11 #geqo_effort = 1 #geqo_generations = 0 #geqo_pool_size = 0 # default based on tables in
statement, # range 128-1024 #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 10 # range 1-1000 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of
explicit JOINs |
- [PERFORM] slow database, queries accumulating Anjan Dave