Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List, Today has been good since morning. Although it is a lean day for us but the indications are nice. I thank everyone who shared the concern. I think the most significant change has been to reduce shared_buffers from 10G to 4G , this has lead to reduced memory usage and some breathing space to the OS. Although i am yet to incorporate the suggestions from pgtune but i think the issue of max_connection needs to be addressed first. I am investigating application issues and about the mechanism that puts many backend to 'IDLE in transaction ' mode for significant times. I thank Tom for the script he sent. Once that resolves i shall check pooling as suggested by Kevin, then eventually max_connections can be reduced. I shall also check pgpool and pgbouncer if they are helpful in this regard. I observed that the number of simultaneous connection today (lean day) hovers between 1 to 10 , occasionally shooting to 15 but never more than 20 i would say. I am happy that i/o waits are negligible and cpu is idling also for a while. procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 22 0 18468 954120 13460 2849177200 568 1558 13645 18355 62 10 27 2 0 16 0 18468 208100 13476 2846908400 580 671 14039 17055 67 13 19 1 0 10 2 18812 329032 13400 283569720 46 301 1768 13848 17884 68 10 20 1 0 16 2 18812 366596 13416 2836162000 325 535 13957 16649 72 11 16 1 0 50 1 18812 657048 13432 2836654800 416 937 13823 16667 62 9 28 1 0 6 1 18812 361040 13452 2837190800 323 522 14352 16789 74 12 14 0 0 33 0 18812 162760 12604 2821015200 664 1544 14701 16315 66 11 22 2 0 5 0 18812 212028 10764 2792180000 552 648 14567 17737 67 10 21 1 0 6 0 18796 279920 10548 2789038830 359 562 12635 15976 60 9 30 1 0 8 0 18796 438820 10564 2789444000 289 2144 12234 15770 57 8 34 1 0 5 0 18796 531800 10580 2790170000 514 394 12169 16005 59 8 32 1 0 17 0 18796 645868 10596 2789070400 423 948 13369 16554 67 10 23 1 0 9 1 18796 1076540 10612 27898604 00 598 403 12703 17363 71 10 18 1 0 8 0 18796 1666508 10628 27904748 00 430 1123 13314 17421 57 9 32 1 0 9 1 18776 1541444 10644 27913092 10 653 954 13194 16822 75 11 12 1 0 8 0 18776 1526728 10660 27921380 00 692 788 13073 16987 74 9 15 1 0 8 0 18776 1482304 10676 27933176 00 966 2029 13017 16651 76 12 11 1 0 21 0 18776 1683260 10700 27937492 00 298 663 13110 15796 67 10 23 1 0 18 0 18776 2087664 10716 27943512 00 406 622 12399 17072 62 9 28 1 0 With 300 connections, I think that either of these could lead you to experience intermittent bursts of extreme swapping. I'd drop it to somewhere in the 16MB to 32MB range until I had a connection pool configured such that it was actually keeping the number of active connections much lower. (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 (64kb , via default) Sure, I'd boost this. checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified) If you have the disk space for the 30 segments, I wouldn't reduce it. shared_buffers = 7680MB # pgtune wizard 2010-06-25 (4096 MB , specified) This one is perhaps the most sensitive to workload. Anywhere between 1GB and 8GB might be best for you. Greg Smith has some great advice on how to tune this for your workload. (*) max_connections = 80 # pgtune wizard 2010-06-25 (300 , ;-) specified) when i reduce max_connections i start getting errors, i will see again concurrent connections during business hours. That's probably a good number to get to, but you have to reduce the number of actual connections before you set the limit that low. lot of our connections are in IDLE in transaction state If any of these stay in that state for more than a minute or two, you need to address that if you want to get your connection count under control. If any of them persist for hours or days, you need to fix it to avoid bloat which can kill performance. -Kevin
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List , A simple (perl) script was made to 'watch' the state transitions of back ends. On startup It captures a set of pids for watching and displays a visual representation of the states for next 30 intervals of 1 seconds each. The X axis is interval cnt, Y axis is pid and the origin is on top-left. The state value can be Active Query (*) , or IDLE indicated by '.' or 'IDLE in transaction' indicated by '?' . for my server below is a random output (during lean hours and on a lean day). PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 4334 ? ? ? ? * ? ? ? ? ? * ? ? ? ? ? ? ? ? 6904 ? ? . . . * ? . . . . . . ? ? . 6951 ? ? ? . . . . ? ? ? ? ? . . . ? ? ? . . . ? . . . . . ? ? . 7009 ? * ? ? . . . . . . . . . * * . * ? ? . . . * ? ? ? . . . ? 7077 ? . ? . . . * . ? . . . . ? . . . . . . . ? . . ? . . . ? ? 7088 ? . . ? . ? ? ? . . . . . . ? . . ? ? * . . . . . ? . ? . * 7091 ? . . * ? ? ? ? ? ? ? * ? . . ? * . * . . . . . . . . . . . 7093 ? ? . ? . . . . ? . ? * . . . . . . . . . ? ? ? . ? ? ? . . 7112 * * . . . ? ? ? . . . . . . . . ? ? . ? . ? . . ? . . . . . 7135 ? . . * . ? ? ? . ? ? . . . ? . . . . . . . ? . . . ? ? . . 7142 ? . ? . . . . . . * . . . ? . . . . . . . . . . . . . . 7166 ? . ? ? ? * * . ? * . ? . . . ? . ? ? . . . * . . . ? . . . 8202 ? ? . . . * . ? . . . . . . . * ? . . . ? ? . . . . ? ? ? . 8223 ? . . . . . . ? 8237 ? ? ? . ? ? ? ? . . . . . . . ? . . . . . ? . . * ? . . . . 8251 ? . ? . . . . . ? ? . . . * ? . . . ? . . . . . . . . . . . 8278 ? ? . . . . ? . . . . . . . ? . . . . . . ? ? . . * . . . . 8290 ? . . 8294 ? ? . . . . . . . . . . . . ? . . . ? ? . . . . . . . . * * 8303 ? * ? . ? ? ? . ? ? ? . . . . * . . . . . . . . . . . . . . 8306 ? ? . . . ? . . . ? . . . . . . * . . . 8309 * ? ? ? ? . . . ? . . . 8329 ? . * * . . . . . . . * . ? . * . ? . * . * ? . . . (*) Active Query , (.) Idle , (?) Idle in transaction,blank backend over. Looks like most of the graph space is filled with (.) or (?) and very less active queries (long running queries 1s). on a busy day and busi hour i shall check the and post again. The script is presented which depends only on perl , DBI and DBD::Pg. script pasted here: http://pastebin.com/mrjSZfLB Regds mallah. On Sat, Jun 26, 2010 at 3:23 PM, Rajesh Kumar Mallah mallah.raj...@gmail.com wrote: Dear List, Today has been good since morning. Although it is a lean day for us but the indications are nice. I thank everyone who shared the concern. I think the most significant change has been to reduce shared_buffers from 10G to 4G , this has lead to reduced memory usage and some breathing space to the OS. Although i am yet to incorporate the suggestions from pgtune but i think the issue of max_connection needs to be addressed first. I am investigating application issues and about the mechanism that puts many backend to 'IDLE in transaction ' mode for significant times. I thank Tom for the script he sent. Once that resolves i shall check pooling as suggested by Kevin, then eventually max_connections can be reduced. I shall also check pgpool and pgbouncer if they are helpful in this regard. I observed that the number of simultaneous connection today (lean day) hovers between 1 to 10 , occasionally shooting to 15 but never more than 20 i would say. I am happy that i/o waits are negligible and cpu is idling also for a while. procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 22 0 18468 954120 13460 2849177200 568 1558 13645 18355 62 10 27 2 0 16 0 18468 208100 13476 2846908400 580 671 14039 17055 67 13 19 1 0 10 2 18812 329032 13400 283569720 46 301 1768 13848 17884 68 10 20 1 0 16 2 18812 366596 13416 2836162000 325 535 13957 16649 72 11 16 1 0 50 1 18812 657048 13432
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
On Fri, 2010-06-25 at 15:25 +0530, Rajesh Kumar Mallah wrote: shared_buffers = 10GB # min 128kB work_mem = 512MB# min 64kB These are still pretty high IMHO. How many *concurrent* connections do you have? -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Rajesh Kumar Mallah wrote: A scary phenomenon is being exhibited by the server , which is the server is slurping all the swap suddenly 8 1 4192912 906164 6100 2787364000 2277 858 13440 16235 63 8 19 10 0 I reduced work_mem from 4GB to 2GB to 512MB (now). I clearly remember that this abnormal consumption of swap was NOT there even when work_mem was 4GB. eg during happier times swap utilisation was: http://pastebin.com/bnE1pFZ9 the question is whats making postgres slurp the swap? i am posting my current postgresql.conf once again. # cat postgresql.conf | grep -v ^\s*# | grep -v ^\s*$ listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 300 # (change requires restart) Hello Rajesh, In constrast with e.g. shared_buffers and effective_cache_size, work_mem is amount of memory per 'thing' (e.g. order/group by) that wants some working memory, so even a single backend can use several pieces of work_mem memory. Looking at your postgresql.conf, other memory values seem a bit too high as well for a 32GB ram server. It is probably a good idea to use pgtune (on pgfoundry) to get some reasonable ball park settings for your hardware. regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List, Hmmm , lemme test efficacy of pg_tune. I would reduce shared buffers also. regarding concurrent queries: its now non business hours and SELECT procpid,current_query from pg_stat_activity where current_query not ilike '%idle%' ; is just 5-10, i am yet to measure it during business hours. Warm Regds Rajesh Kumar Mallah. On Fri, Jun 25, 2010 at 4:58 PM, Yeb Havinga yebhavi...@gmail.com wrote: Rajesh Kumar Mallah wrote: A scary phenomenon is being exhibited by the server , which is the server is slurping all the swap suddenly 8 1 4192912 906164 6100 2787364000 2277 858 13440 16235 63 8 19 10 0 I reduced work_mem from 4GB to 2GB to 512MB (now). I clearly remember that this abnormal consumption of swap was NOT there even when work_mem was 4GB. eg during happier times swap utilisation was: http://pastebin.com/bnE1pFZ9 the question is whats making postgres slurp the swap? i am posting my current postgresql.conf once again. # cat postgresql.conf | grep -v ^\s*# | grep -v ^\s*$ listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 300 # (change requires restart) Hello Rajesh, In constrast with e.g. shared_buffers and effective_cache_size, work_mem is amount of memory per 'thing' (e.g. order/group by) that wants some working memory, so even a single backend can use several pieces of work_mem memory. Looking at your postgresql.conf, other memory values seem a bit too high as well for a 32GB ram server. It is probably a good idea to use pgtune (on pgfoundry) to get some reasonable ball park settings for your hardware. regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Rajesh Kumar Mallah mallah.raj...@gmail.com wrote: its now non business hours and SELECT procpid,current_query from pg_stat_activity where current_query not ilike '%idle%' ; is just 5-10, i am yet to measure it during business hours. Be careful about 'IDLE in transaction' status. Those are a problem if the transaction remains active for very long, because vacuum (autovacuum or otherwise) can't free space for dead rows which could still be visible to the 'IDLE in transaction' connection. It's normal to see this status briefly between statements in a transaction, but it's a problem if a connection just sits there in this status. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
I changed shared_buffers from 10G to 4G , swap usage has almost become nil. # free total used free sharedbuffers cached Mem: 32871276 245758248295452 0 11064 22167324 -/+ buffers/cache:2397436 30473840 Swap: 41929123524192560 I also observed that there was a huge IO wait and load spike initially which gradually reduced to normal levels. Now things seems to be fine. but real test shall be during business hours. vmstat output: http://pastebin.com/ygu8gUhS the iowait now is very respectable 10% and CPU is idling most of the time. # vmstat 10 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 2 1352 8482444 11336 2229910010 450 20800 50 6 39 5 0 4 0352 8393840 11352 2230448400 480 163 9260 12717 32 4 62 3 0 5 1352 8474788 11360 2230898000 304 445 8295 12358 28 4 67 2 0 3 0352 8370672 11376 2231667600 648 158 8760 13214 38 4 55 3 0 11 0352 8193824 11392 2232357200 621 577 8800 13163 37 4 56 3 0 2 0352 8229012 11408 2232666400 169 405 9588 13696 34 4 61 1 0 6 1352 8319176 11424 2233314400 559 170 8830 12929 32 4 61 3 0 I shall also try pgtune in a while. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List, pgtune suggests the following: (current value are in braces via reason) , (*) indicates significant difference from current value. default_statistics_target = 50 # pgtune wizard 2010-06-25 (current 100 via default) (*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 (16MB via default) checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 (0.5 via default) (*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 (18GB , specified) work_mem = 192MB # pgtune wizard 2010-06-25 (256MB , specified) (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 ( 64kb , via default) checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified) shared_buffers = 7680MB # pgtune wizard 2010-06-25 ( 4096 MB , specified) (*) max_connections = 80 # pgtune wizard 2010-06-25 ( 300 , ;-) specified ) when i reduce max_connections i start getting errors, i will see again concurrent connections during business hours. lot of our connections are in IDLE in transaction state during business this peculiar behavior of mod_perl servers have been discussed in past i think. dont' remember if there was any resolution.
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
On 25/06/10 16:59, Rajesh Kumar Mallah wrote: when i reduce max_connections i start getting errors, i will see again concurrent connections during business hours. lot of our connections are in IDLE in transaction state during business this peculiar behavior of mod_perl servers have been discussed in past i think. dont' remember if there was any resolution. If connections spend any significant amount of time in IDLE in transaction state, that might indicate you're not committing/rolling back after running queries - can you show an example of the code you're using? e.g. something like my $dbh = DBI-connect(...); my $sth = $dbh-prepare(q{select ... }); $sth-fetchall_arrayref; $sth-rollback; Tom -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Rajesh Kumar Mallah mallah.raj...@gmail.com wrote: pgtune suggests the following: (current value are in braces via reason) , (*) indicates significant difference from current value. Different people have come to different conclusions on some of these settings. I believe that's probably because differences in hardware and workloads actually make the best choice different in different environments, and it's not always clear how to characterize that to make the best choice. If yo get conflicting advice on particular settings, I would strongly recommend testing to establish what works best for your actual workload on your hardware and OS. That said, my experience suggests... default_statistics_target = 50 # pgtune wizard 2010-06-25 (current 100 via default) Higher values add a little bit to the planning time of complex queries, but reduce the risk of choosing a bad plan. I would recommend leaving this at 100 unless you notice problems with long plan times. (*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 (16MB via default) Yeah, I'd boost this to 1GB. checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 (0.5 via default) I'd change this one by itself, and probably after some of the other tuning is done, so you can get a good sense of before and after. I'm guessing that 0.9 would be better, but I would test it. (*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 (18GB , specified) Unless you're running other processes on the box which consume a lot of RAM, 18GB is probably lower than ideal, although this setting isn't too critical -- it doesn't affect actual RAM allocation; it just gives the optimizer a hint about how much might get cached. A higher setting encourages index use; a lower setting encourages table scans. work_mem = 192MB # pgtune wizard 2010-06-25 (256MB , specified) With 300 connections, I think that either of these could lead you to experience intermittent bursts of extreme swapping. I'd drop it to somewhere in the 16MB to 32MB range until I had a connection pool configured such that it was actually keeping the number of active connections much lower. (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 (64kb , via default) Sure, I'd boost this. checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified) If you have the disk space for the 30 segments, I wouldn't reduce it. shared_buffers = 7680MB # pgtune wizard 2010-06-25 (4096 MB , specified) This one is perhaps the most sensitive to workload. Anywhere between 1GB and 8GB might be best for you. Greg Smith has some great advice on how to tune this for your workload. (*) max_connections = 80 # pgtune wizard 2010-06-25 (300 , ;-) specified) when i reduce max_connections i start getting errors, i will see again concurrent connections during business hours. That's probably a good number to get to, but you have to reduce the number of actual connections before you set the limit that low. lot of our connections are in IDLE in transaction state If any of these stay in that state for more than a minute or two, you need to address that if you want to get your connection count under control. If any of them persist for hours or days, you need to fix it to avoid bloat which can kill performance. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Rajesh Kumar Mallah wrote: default_statistics_target = 50 # pgtune wizard 2010-06-25 (current 100 via default) (*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 (18GB , specified) checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified) You probably want to keep your existing values for all of these. Your effective_cache_size setting may be a little low, but I wouldn't worry about changing that right now--you have bigger problems right now. (*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 (16MB via default) (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 ( 64kb , via default) checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 (0.5 via default) shared_buffers = 7680MB # pgtune wizard 2010-06-25 ( 4096 MB , specified) These are all potentially better for your system, but you'll use more RAM if you make these changes. For example, if you're having swap trouble, you definitely don't want to increase maintenance_work_mem. I suspect that 8GB of shared_buffers is probably the most you want to use. Most systems stop gaining any more benefit from that somewhere between 8GB and 10GB, and instead performance gets worse; it's better to be on the low side of that drop. You can probably support 8GB just fine if you sort out the work_mem issues. (*) max_connections = 80 # pgtune wizard 2010-06-25 ( 300 , ;-) specified ) work_mem = 192MB # pgtune wizard 2010-06-25 (256MB , specified) pgtune makes a guess at how many connections you'll have based on specified workload. If you know you have more connections than that, you should specify that on the command line: pgtune -c 300 ... It will then re-compute the work_mem figure more accurately using that higher connection count. Right now, it's guessing 192MB based on 80 connections, which is on the high side of reasonable. 192MB with *300* connections is way oversized. My rough computation says that if you tell it the number of connections correctly, pgtune will suggest to you around 50MB for work_mem. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear Greg/Kevin/List , Many thanks for the comments regarding the params, I am however able to change an experiment on production in a certain time window , when that arrives i shall post my observations. Rajesh Kumar Mallah. Tradeindia.com - India's Largest B2B eMarketPlace.