Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL (7.4.3) for everything from user information to formatting and display of specific sections of the site. The server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).
Recently loads on the site have increased during peak hours to the point of showing considerable loss in performance. This can be observed when connections move from the 120 concurrent connections to PostgreSQL to roughly 175 or more. Essentially, the machine seems to struggle to keep up with continual requests and slows down respectively as resources are tied down.
Code changes have been made to the scripts to essentially back off in high load working environments which have worked to an extent. However, as loads continue to increase the database itself is not taking well to the increased traffic taking place.
Having taken a look at 'Tuning PostgreSQL for Performance' (http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as best I could in order to set my settings. However, even with statistics disabled and ever setting tweaked things still consider to deteriorate.
Is there anything anyone can recommend in order to give the system a necessary speed boost? It would seem to me that a modest dataset of roughly a Gig combined with that type of hardware should be able to handle substantially more load then what it is. Can anyone provide me with clues as where to pursue? Would disabling 'fsync' provide more performance if I choose that information may be lost in case of a crash?
If anyone needs access to logs, settings et cetera. Please ask, I simply wish to test the waters first on what is needed. Thanks!
Tell us about your tipical queries, show us your configuration file. The access are only in read only mode or do you have concurrent writers and readers ? During peak hours your processors are tied to 100% ? What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the planner is not using it... In two words we needs other informations in order to help you.
Regards Gaetano Mendola
I included all the files in attachments, which will hopefully cut down on any replied to Emails. As for things like connection pooling, the web server makes use of Apache::DBI to pool the connections for the Perl scripts being driven on that server. For the sake of being thorough, a quick 'apachectl status' was thrown in when the database was under a good load.
Since it would rather slow things down to wait for the servers to really get bogged down with load averages of 20.00 and more, I opted to choose a period of time where we are a bit busier then normal. You will be able to see how the system behaves under a light load and subsequently reaching 125 or so concurrent connections.
The queries themselves are simple, normally drawing information from one table with few conditions or in the most complex cases using joins on two table or sub queries. These behave very well and always have, the problem is that these queries take place in rather large amounts due to the dumb nature of the scripts themselves.
Over a year ago when I was still using MySQL for the project, the statistics generated would report well over 65 queries per second under loads ranging from 130 to 160 at peak but averaged over the weeks of operation. Looking at the Apache status, one can see that it averages only roughly 2.5 requests per second giving you a slight indication as to what is taking place.
A quick run of 'systat -ifstat' shows the following graph:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average >>>>>>>>>>>
Interface Traffic Peak Total
lo0 in 0.000 KB/s 0.000 KB/s 37.690 GB
out 0.000 KB/s 0.000 KB/s 37.690 GB em0 in 34.638 KB/s 41.986 KB/s 28.998 GB
out 70.777 KB/s 70.777 KB/s 39.553 GBEm0 is a full duplexed 100Mbs connection to an internal switch that supports the servers directly. Load on the loopback was cut down considerably once I stopped using pg_autovaccum since its performance benefits under low load were buried under the hindrance it caused when traffic was high.
I am sure that there are some places that could benefit from some optimization. Especially in the case of indexes, however as a whole the problem seems to be related more to the massive onslaught of queries then it does anything else.
Also note that some of these scripts run for longer durations even if they are web based. Some run as long as 30 minutes, making queries to the database from periods of wait from five seconds to twenty-five seconds. Under high duress the timeouts should back out, based on the time needed for the query to respond, normally averaging 0.008 seconds.
Does this help at all, or is more detail needed on the matter?
Martin Foster
[EMAIL PROTECTED]# ----------------------------- # PostgreSQL configuration file # ----------------------------- # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # "pg_ctl reload".
#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------
# - Connection Settings -
tcpip_socket = true
ssl = false
max_connections = 512
superuser_reserved_connections = 2
rendezvous_name='io'
port = 5432
#unix_socket_directory = '/var/postgres'
#unix_socket_group = 'postgres'
#unix_socket_permissions = 0777 # octal
# - Security & Authentication -
authentication_timeout = 20 # 1-600, in seconds
password_encryption = true
db_user_namespace = false
#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------
# - Memory -
shared_buffers = 8192 # min 16, at least max_connections*2, 8KB each
sort_mem = 2048 # min 64, size in KB
vacuum_mem = 65536 # min 1024, size in KB
# - Free Space Map -
#max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000 # min 100, ~50 bytes each
# - Kernel Resource Usage -
#max_files_per_process = 1000 # min 25
#preload_libraries = ''
#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------
# - Settings -
fsync = true # turns forced synchronization on or off
wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync, open_sync, or open_datasync
wal_buffers = 128 # min 4, 8KB each
# - Checkpoints -
checkpoint_segments = 24 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300 # range 30-3600, in seconds
checkpoint_warning = 30 # 0 is off, in seconds
commit_delay = 15000 # range 0-100000, in microseconds
commit_siblings = 64 # range 1-1000
#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------
# - Planner Method Enabling -
enable_hashagg = true
enable_hashjoin = true
enable_indexscan = true
enable_mergejoin = true
enable_nestloop = true
enable_seqscan = true
enable_sort = true
enable_tidscan = true
# - Planner Cost Constants -
effective_cache_size = 16000 # typically 8KB each
random_page_cost = 2 # units are one sequential page fetch cost
cpu_tuple_cost = 0.01 # (same)
cpu_index_tuple_cost = 0.001 # (same)
cpu_operator_cost = 0.0025 # (same)
# - Genetic Query Optimizer -
geqo = false
geqo_threshold = 11
geqo_effort = 1
geqo_generations = 0
geqo_pool_size = 0 # default based on tables in statement,
# range 128-1024
geqo_selection_bias = 2.0 # range 1.5-2.0
# - Other Planner Options -
default_statistics_target = 100 # range 1-1000
from_collapse_limit = 8
join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs
#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------
# - Syslog -
#syslog = 2 # range 0-2; 0=stdout; 1=both; 2=syslog
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
# - When to Log -
client_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# log, info, notice, warning, error
log_min_messages = info # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, log, fatal,
# panic
log_error_verbosity = default # terse, default, or verbose messages
log_min_error_statement = panic # Values in order of increasing severity:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, panic(off)
log_min_duration_statement = -1 # Log all statements whose
# execution time exceeds the value, in
# milliseconds. Zero prints all queries.
# Minus-one disables.
silent_mode = false # DO NOT USE without Syslog!
# - What to Log -
#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#log_connections = false
#log_duration = false
#log_pid = false
#log_statement = false
#log_timestamp = false
#log_hostname = false
#log_source_port = false
#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------
# - Statistics Monitoring -
log_parser_stats = false
log_planner_stats = false
log_executor_stats = false
log_statement_stats = false
# - Query/Index Statistics Collector -
stats_start_collector = false
stats_command_string = false
stats_block_level = false
stats_row_level = false
stats_reset_on_server_start = true
#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------
# - Statement Behavior -
search_path = '$user,public' # schema names
check_function_bodies = true
default_transaction_isolation = 'read committed'
default_transaction_read_only = false
statement_timeout = 25000 # 0 is disabled, in milliseconds
# - Locale and Formatting -
datestyle = 'iso, us'
#timezone = unknown # actually, defaults to TZ environment setting
#australian_timezones = false
#extra_float_digits = 0 # min -15, max 2
#client_encoding = sql_ascii # actually, defaults to database encoding
# These settings are initialized by initdb -- they may be changed
lc_messages = 'C' # locale for system error message strings
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
# - Other Defaults -
explain_pretty_print = true
#dynamic_library_path = '$libdir'
#max_expr_depth = 10000 # min 10
#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------
deadlock_timeout = 5000 # in milliseconds
#max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes each
#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------
# - Previous Postgres Versions -
#add_missing_from = true
#regex_flavor = advanced # advanced, extended, or basic
#sql_inheritance = true
# - Other Platforms & Clients -
#transform_null_equals = false[EMAIL PROTECTED] ~$ apachectl status
Apache Server Status for www.ethereal-realms.org
Server Version: Apache/1.3.29 (Unix) mod_perl/1.27 mod_gzip/1.3.26.1a
mod_ssl/2.8.16 OpenSSL/0.9.7c
Server Built: May 24 2004 00:32:11
_________________________________________________________________
Current Time: Tuesday, 03-Aug-2004 21:34:37 MDT
Restart Time: Tuesday, 03-Aug-2004 03:54:21 MDT
Parent Server Generation: 0
Server uptime: 17 hours 40 minutes 16 seconds
Total accesses: 168852 - Total Traffic: 406.0 MB
CPU Usage: u283.96 s41.27 cu92.51 cs13.68 - .678% CPU load
2.65 requests/sec - 6.5 kB/second - 2521 B/request
127 requests currently being processed, 59 idle servers
W_WWWW_WW.WWKKWK__W._WW_.WK._KK._..K.WW__WK_WW_WWWK_.__WWW_W.WWW
__..__W._W__WWWWWW._WKWW_W_KKW_WWW___K_.WKWW__KWWWWW_WWWWKKKWW.W
__.WK_WW.W_W_WWWW_KWWWWW_KW.WW.W__W_W___W___WWWKW__W_W._.WWW.WWK
W___WKKW.K_W.W...KW....WW_......................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"L" Logging, "G" Gracefully finishing, "." Open slot with no current
process
[EMAIL PROTECTED] ~$ vmstat
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id
0 0 0 498532 122848 3306 0 0 0 740 0 0 0 788 0 1675 16 21 63
[EMAIL PROTECTED] ~$ pstat -s
Device 1K-blocks Used Avail Capacity
/dev/ar0s1b 4194304 24 4194280 0%
[EMAIL PROTECTED] ~$ iostat
tty ad4 ad6 ar0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
0 17 0.50 0 0.00 0.50 0 0.00 20.28 23 0.45 16 0 20 1 63
[EMAIL PROTECTED] ~# ps -U postgres | wc -l
127
[EMAIL PROTECTED] ~$ ps -uax
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
postgres 32084 0.0 0.2 91616 3764 p0- R Mon12PM 4:08.99
/usr/local/bin/postmaster -D /var/postgres (postgres)
postgres 80333 0.0 2.1 94620 44372 ?? S 8:57PM 0:01.00 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 80599 0.0 2.1 94652 44780 ?? S 8:59PM 0:00.97 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 80616 0.0 2.4 94424 50396 ?? S 8:59PM 0:00.89 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 80715 0.0 2.2 94444 46804 ?? S 9:00PM 0:00.68 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 80788 0.0 2.1 94424 43944 ?? S 9:00PM 0:00.93 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 80811 0.0 2.1 94424 43884 ?? S 9:00PM 0:00.94 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 80902 0.0 2.1 94424 43380 ?? S 9:01PM 0:00.76 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 80949 0.0 2.2 94424 45248 ?? S 9:01PM 0:00.67 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 81020 0.0 2.1 94424 42924 ?? S 9:02PM 0:00.74 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
[EMAIL PROTECTED] ~# date
Tue Aug 3 21:28:53 MDT 2004
[EMAIL PROTECTED] ~$ systat -vmstat
1 users Load 4.15 3.76 3.54 Aug 3 21:30
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 293708 4120 482092 6736 132204 count
All 1910640 5224 2462084 17060 pages
Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt 212 cow 728 total
5 139 215023802 3112 1078 58023548 397288 wire 100 0: clk
295304 act 6: fdc0
59.6%Sys 0.9%Intr 38.9%User 0.0%Nice 0.6%Idl 1151936 inact 128 8: rtc
| | | | | | | | | | 92748 cache 53 10: bge
==============================>>>>>>>>>>>>>>>>>> 39456 free 447 11: em0
daefr 14: ata
Namei Name-cache Dir-cache 1345 prcfr
Calls hits % hits % react
2761 2761 100 pdwake
1111 zfod pdpgs
Disks ad4 ad6 ar0 fd0 1059 ofod intrn
KB/t 0.00 0.00 16.00 0.00 95 %slo-z 204096 buf
tps 0 0 35 0 1587 tfree 591 dirtybuf
MB/s 0.00 0.00 0.54 0.00 120525 desiredvnodes
% busy 0 0 5 0 30131 numvnodes
22695 freevnodes
[EMAIL PROTECTED] ~$ vmstat
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id
0 0 0 317216 128932 3121 0 0 0 736 0 0 0 789 0 1682 16 21 63
[EMAIL PROTECTED] ~$ pstat -s
Device 1K-blocks Used Avail Capacity
/dev/ar0s1b 4194304 24 4194280 0%
[EMAIL PROTECTED] ~$ iostat
tty ad4 ad6 ar0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
0 17 0.50 0 0.00 0.50 0 0.00 20.32 23 0.45 16 0 20 1 63
[EMAIL PROTECTED] ~$ ps -U postgres | wc -l
72
[EMAIL PROTECTED] ~$ ps -uax
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
postgres 32084 0.0 0.2 91616 3764 p0- S Mon12PM 3:17.42
/usr/local/bin/postmaster -D /var/postgres (postgres)
postgres 51100 0.0 1.7 94412 35828 ?? S 3:36PM 0:01.02 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51134 0.0 1.8 94620 37060 ?? S 3:36PM 0:01.04 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51147 0.0 1.8 94620 36980 ?? S 3:37PM 0:00.66 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51240 0.0 1.8 94412 37156 ?? S 3:38PM 0:00.62 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51276 0.0 0.7 94412 15180 ?? S 3:38PM 0:00.41 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51277 0.0 1.9 94492 40776 ?? S 3:38PM 0:00.57 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51287 0.0 1.8 94620 37228 ?? S 3:38PM 0:00.63 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51368 0.0 1.7 94412 35348 ?? S 3:40PM 0:00.55 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51493 0.0 1.7 94412 35920 ?? S 3:42PM 0:00.49 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
postgres 51495 0.0 1.7 94412 36192 ?? S 3:42PM 0:00.89 postmaster: ethereal
ethereal 192.168.1.6 idle in trans
[EMAIL PROTECTED] ~$ date
Tue Aug 3 16:09:06 MDT 2004
[EMAIL PROTECTED] ~$ systat -vmstat
1 users Load 0.53 0.54 0.47 Aug 3 16:14
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 205172 4312 326472 6736 120128 count
All 1909516 4944 2215992 11896 pages
Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt 197 cow 642 total
96 177814901 2186 947 20014746 386844 wire 100 0: clk
206544 act 6: fdc0
28.8%Sys 0.5%Intr 26.8%User 0.0%Nice 43.9%Idl 1263216 inact 128 8: rtc
| | | | | | | | | | 69644 cache 55 10: bge
==============+>>>>>>>>>>>>> 50484 free 359 11: em0
daefr 14: ata
Namei Name-cache Dir-cache 1235 prcfr
Calls hits % hits % react
1709 1709 100 pdwake
1000 zfod pdpgs
Disks ad4 ad6 ar0 fd0 938 ofod intrn
KB/t 0.00 0.00 16.00 0.00 93 %slo-z 204096 buf
tps 0 0 31 0 1482 tfree 56 dirtybuf
MB/s 0.00 0.00 0.49 0.00 120525 desiredvnodes
% busy 0 0 2 0 30131 numvnodes
7645 freevnodes
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
