Re: [GENERAL] Increase in max_connections
Kevin Grittner wrote: > Please post the output of this: > > numactl --hardware Oh, it would also help in making specific suggestions if you could show the output of: mount | grep cpuset ... and a listing of "file names" in the mounted directory. There is some variation among distros in both where this directory is and what names are used. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Increase in max_connections
"Anand Kumar, Karthik" wrote: > We finally made some headway on this - we noticed messages like > the below > in /var/log/messages whenever the issue happened: > > Mar 26 07:39:58 site-db01b kernel: postmaster: page allocation failure. > Anyone have any idea why memory was so fragmented, and what > causes memory to be defragged? Is it something postgres does? Are > there any kernel specific settings that control it? While I agree with other replies that you would benefit from transaction-based connection pooling, there may be another issue at play here. I can't be sure from evidence so far, but this might be related to something I've been looking at related to NUMA memory and how the OS buffers and PostgreSQL shared_buffers interact with it. Most of the available benefit can be realized without any change to the PostgreSQL code by using the cpuset features of the OS. If you want to investigate this, the first thing would be to get a look at the shape of things. Please post the output of this: numactl --hardware The above just reports on the hardware -- it doesn't change anything. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Increase in max_connections
Thanks Bruce. Really interesting, but, I show zone reclaim is already turned off on our system. root@site-db01b:~ # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17 node 0 size: 393181 MB node 0 free: 467 MB node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23 node 1 size: 393215 MB node 1 free: 319 MB node distances: node 0 1 0: 10 20 1: 20 10 root@site-db01b:~ # cat /proc/sys/vm/zone_reclaim_mode 0 Thanks, Karthik On 3/26/14 1:54 PM, "Bruce Momjian" wrote: >On Wed, Mar 26, 2014 at 08:22:01PM +, Anand Kumar, Karthik wrote: >> Looking a little deeper, I saw signs of memory being heavily fragmented: >> >> root@site-db01b:/var/log # cat /proc/buddyinfo >> Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3 >> Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92 >> Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1 >> Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0 >> >> >> Node 0 has 13069 4k blocks, and zero 8k blocks available to use >> Which is likely what caused the problem, I'd think. >> >> A little while later though, buddyinfo changed and suddenly there was a >> lot more memory in 8k blocks. >> >> root@site-db01b:/proc # cat /proc/buddyinfo >> Node 0, zone DMA 1 1 2 2 2 1 0 >> 0 1 1 3 >> Node 0, zoneDMA32 8 7 8 7 10 8 7 >> 11 9 5 92 >> Node 0, zone Normal 9645 5495 1115 0 0 0 0 >> 0 0 0 1 >> Node 1, zone Normal 409734 10953 1 0 1 1 0 >> 1 1 1 0 >> >> (Note the change in the Node 0 line, 8k blocks went up from 0 to 5495) >> >> Anyone have any idea why memory was so fragmented, and what causes >>memory >> to be defragged? Is it something postgres does? Are there any kernel >> specific settings that control it? > >If I had to take a guess, it is zone_reclaim; see: > > > http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-recl >aim-mode.html > >The fix is this sysctl: > > vm.zone_reclaim_mode = 0 > >-- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + Everyone has their own god. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Increase in max_connections
On Wed, Mar 26, 2014 at 08:22:01PM +, Anand Kumar, Karthik wrote: > Looking a little deeper, I saw signs of memory being heavily fragmented: > > root@site-db01b:/var/log # cat /proc/buddyinfo > Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3 > Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92 > Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1 > Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0 > > > Node 0 has 13069 4k blocks, and zero 8k blocks available to use > Which is likely what caused the problem, I'd think. > > A little while later though, buddyinfo changed and suddenly there was a > lot more memory in 8k blocks. > > root@site-db01b:/proc # cat /proc/buddyinfo > Node 0, zone DMA 1 1 2 2 2 1 0 > 0 1 1 3 > Node 0, zoneDMA32 8 7 8 7 10 8 7 > 11 9 5 92 > Node 0, zone Normal 9645 5495 1115 0 0 0 0 > 0 0 0 1 > Node 1, zone Normal 409734 10953 1 0 1 1 0 > 1 1 1 0 > > (Note the change in the Node 0 line, 8k blocks went up from 0 to 5495) > > Anyone have any idea why memory was so fragmented, and what causes memory > to be defragged? Is it something postgres does? Are there any kernel > specific settings that control it? If I had to take a guess, it is zone_reclaim; see: http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html The fix is this sysctl: vm.zone_reclaim_mode = 0 -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Increase in max_connections
Hi all, We finally made some headway on this - we noticed messages like the below in /var/log/messages whenever the issue happened: Mar 26 07:39:58 site-db01b kernel: postmaster: page allocation failure. order:1, mode:0x20 Mar 26 07:39:58 site-db01b kernel: Pid: 39066, comm: postmaster Not tainted 2.6.32-279.el6.x86_64 #1 Mar 26 07:39:58 site-db01b kernel: Call Trace: Mar 26 07:39:58 site-db01b kernel: [] ? __alloc_pages_nodemask+0x77f/0x940 Mar 26 07:39:58 site-db01b kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 26 07:39:58 site-db01b kernel: [] ? kmem_getpages+0x62/0x170 Mar 26 07:39:58 site-db01b kernel: [] ? cache_grow+0x2cf/0x320 Mar 26 07:39:58 site-db01b kernel: [] ? cache_alloc_node+0x99/0x160 Mar 26 07:39:58 site-db01b kernel: [] ? kmem_cache_alloc+0x11b/0x190 Mar 26 07:39:58 site-db01b kernel: [] ? sk_clone+0x22/0x2e0 Mar 26 07:39:58 site-db01b kernel: [] ? sk_prot_alloc+0x48/0x1c0 Mar 26 07:39:58 site-db01b kernel: [] ? tcp_create_openreq_child+0x23/0x450 Mar 26 07:39:58 site-db01b kernel: [] ? inet_csk_clone+0x16/0xd0 Mar 26 07:39:58 site-db01b kernel: [] ? tcp_v4_rcv+0x4fe/0x8d0 Mar 26 07:39:58 site-db01b kernel: [] ? ip_local_deliver_finish+0x0/0x2d0 Mar 26 07:39:58 site-db01b kernel: [] ? ip_local_deliver_finish+0xdd/0x2d0 Mar 26 07:39:58 site-db01b kernel: [] ? tcp_v4_syn_recv_sock+0x4d/0x310 Mar 26 07:39:58 site-db01b kernel: [] ? tcp_check_req+0x226/0x460 Mar 26 07:39:58 site-db01b kernel: [] ? tcp_v4_do_rcv+0x35b/0x430 Mar 26 07:39:58 site-db01b kernel: [] ? tcp_rcv_established+0x38d/0x800 Mar 26 07:39:58 site-db01b kernel: [] ? ip_rcv+0x275/0x350 Mar 26 07:39:58 site-db01b kernel: [] ? ip_rcv_finish+0x12d/0x440 Mar 26 07:39:58 site-db01b kernel: [] ? ip_local_deliver+0x98/0xa0 Mar 26 07:39:58 site-db01b kernel: [] ? __netif_receive_skb+0x49b/0x6f0 Mar 26 07:39:58 site-db01b kernel: [] ? tg3_poll_work+0x654/0xe30 [tg3] Mar 26 07:39:58 site-db01b kernel: [] ? tg3_poll_msix+0x4c/0x150 [tg3] Mar 26 07:39:58 site-db01b kernel: [] ? netif_receive_skb+0x58/0x60 Mar 26 07:39:58 site-db01b kernel: [] ? __do_softirq+0xc1/0x1e0 Mar 26 07:39:58 site-db01b kernel: [] ? napi_skb_finish+0x50/0x70 Mar 26 07:39:58 site-db01b kernel: [] ? net_rx_action+0x103/0x2f0 Mar 26 07:39:58 site-db01b kernel: [] ? napi_gro_receive+0x39/0x50 Mar 26 07:39:58 site-db01b kernel: [] ? handle_IRQ_event+0x60/0x170 Mar 26 07:39:58 site-db01b kernel: [] ? irq_exit+0x85/0x90 Mar 26 07:39:58 site-db01b kernel: [] ? do_softirq+0x65/0xa0 Mar 26 07:39:58 site-db01b kernel: [] ? __do_softirq+0x11f/0x1e0 Mar 26 07:39:58 site-db01b kernel: [] ? call_softirq+0x1c/0x30 Doing some digging on that, we disabled TSO/TRO, GSO/GRO at the tcp layer - and that seems to have helped. $ sudo ethtool -k eth0 Offload parameters for eth0: rx-checksumming: off tx-checksumming: on scatter-gather: on tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: off large-receive-offload: off However, I'm looking for more information on what's happening: That stack trace above seems to indicate that it was unable to allocate 2*4k pages (8k) to the network stack. Its likely that was needed for GSO/GRO. However, wondering why the kernel is unable to allocate just 8k - we have a 768G RAM server, with over 54G in buffers/cache root@site-db01b:/proc # free -m total used free sharedbuffers cached Mem:775382 773354 2028 0 1403 738735 -/+ buffers/cache: 33215 742166 Swap:0 0 0 Looking a little deeper, I saw signs of memory being heavily fragmented: root@site-db01b:/var/log # cat /proc/buddyinfo Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3 Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92 Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1 Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0 Node 0 has 13069 4k blocks, and zero 8k blocks available to use Which is likely what caused the problem, I'd think. A little while later though, buddyinfo changed and suddenly there was a lot more memory in 8k blocks. root@site-db01b:/proc # cat /proc/buddyinfo Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3 Node 0, zoneDMA32 8 7 8 7 10 8 7 11 9 5 92 Node 0, zone Normal 9645 5495 1115 0 0 0 0 0 0 0 1 Node 1, zone Normal 409734 10953 1 0 1 1 0 1 1 1 0 (Note the change in the Node 0 line, 8k blocks went up from 0 to 5495) Anyone have any idea why memory was so fragmented, and what causes memory to be defragged? Is it something postgres does? Are there any kernel specific settings that control it? Thanks, Karthik On 3/14/14 3:37 PM, "Anand Kumar, Karthik" wrote: >For anyone that's still following - we tried upgrading to postgres 9.3.3 - >that hasn't helped. > >Running an strace on the pid that was consuming the h
Re: [GENERAL] Increase in max_connections
For anyone that's still following - we tried upgrading to postgres 9.3.3 - that hasn't helped. Running an strace on the pid that was consuming the highest CPU at the time of the outage shows: semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91881569, {{12, 1, 0}}, 1)= 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(89325587, {{14, 1, 0}}, 1)= 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90964037, {{4, 1, 0}}, 1) = 0 semop(90308657, {{5, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(88866821, {{12, 1, 0}}, 1)= 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90439733, {{13, 1, 0}}, 1)= 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90341426, {{2, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90308657, {{5, 1, 0}}, 1) = 0 semop(91881569, {{12, 1, 0}}, 1)= 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(88866821, {{12, 1, 0}}, 1)= 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91881569, {{12, 1, 0}}, 1)= 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90865730, {{5, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90865730, {{5, 1, 0}}, 1) = 0 I've seen other people talk of this problem with a lot of semop calls, haven't yet found a clear solution. Anyone have any ideas? I've also downloaded the perf tool based on http://rhaas.blogspot.com/2012/06/perf-good-bad-ugly.html - will see what that has to show. Thanks, Karthik On 3/11/14 1:06 PM, "John R Pierce" wrote: >On 3/11/2014 10:20 AM, Anand Kumar, Karthik wrote: >> We typically see about 500-700 active queries at a time > >if these are primarily small/fast queries, like OLTP operations, and you >DONT have 200-400 CPU cores on this server, you will likely find that if >you use a queueing mechanism to only execute about 2X your CPU core >count concurrently, you will get MORE total transactions/second than >trying to do 500-700 at once. > >if your apps are using persistent connections, then the session pooling >model won't do any good, you should use transaction pooling. you want >the actual active query count to be tunable, probably down around 2X the >cpu core count, depending on various things.some folks say, CPU >cores/threads plus disk spindles is the optimal number. > > > >-- >john r pierce 37N 122W >somewhere on the middle of the left coast > > > >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Increase in max_connections
On 3/11/2014 10:20 AM, Anand Kumar, Karthik wrote: We typically see about 500-700 active queries at a time if these are primarily small/fast queries, like OLTP operations, and you DONT have 200-400 CPU cores on this server, you will likely find that if you use a queueing mechanism to only execute about 2X your CPU core count concurrently, you will get MORE total transactions/second than trying to do 500-700 at once. if your apps are using persistent connections, then the session pooling model won't do any good, you should use transaction pooling. you want the actual active query count to be tunable, probably down around 2X the cpu core count, depending on various things.some folks say, CPU cores/threads plus disk spindles is the optimal number. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Increase in max_connections
On Tue, Mar 11, 2014 at 10:20 AM, Anand Kumar, Karthik < karthik.anandku...@classmates.com> wrote: > Thanks Jeff. We have scripts in place now to capture the incoming rate > of requests. Waiting on the crash to happen to see if it spikes up :) > > Re: min_log_duration - we *do* see a good number of requests in the log > that hit our cap (of 100ms). Just that nothing stands out when we have the > issue. Whatever queries we do see slow down seem to be after we start the > CPU spike, and so an effect and not a cause. > I think what you have is a vicious cycle: too many active connections leads to contention which leads to slow response which leads to piling up connections which leads to more contention. So the cause and the effect are the same thing as each other, you can't cleanly divide them. > > We typically see about 500-700 active queries at a time - and that seems > to match how high connection limit goes. > This is during normal times, or during the trouble? > We tried pg_bouncer, however, at session level pooling, it slowed down our > applications (they maintain persistent connections once established, so any > connection overhead slows them down), > I don't understand that. If the connections are persistent, why would they increase during the slow down? Cheers, Jeff
Re: [GENERAL] Increase in max_connections
Thanks Jeff. We have scripts in place now to capture the incoming rate of requests. Waiting on the crash to happen to see if it spikes up :) Re: min_log_duration – we *do* see a good number of requests in the log that hit our cap (of 100ms). Just that nothing stands out when we have the issue. Whatever queries we do see slow down seem to be after we start the CPU spike, and so an effect and not a cause. We typically see about 500-700 active queries at a time – and that seems to match how high connection limit goes. We tried pg_bouncer, however, at session level pooling, it slowed down our applications (they maintain persistent connections once established, so any connection overhead slows them down), and with transaction level pooling, simply did not work. Thanks, Karthik From: Jeff Janes mailto:jeff.ja...@gmail.com>> Date: Tuesday, March 11, 2014 9:23 AM To: "Anand Kumar, Karthik" mailto:karthik.anandku...@classmates.com>> Cc: "pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>" mailto:pgsql-general@postgresql.org>> Subject: Re: [GENERAL] Increase in max_connections On Mon, Mar 10, 2014 at 6:04 PM, Anand Kumar, Karthik mailto:karthik.anandku...@classmates.com>> wrote: Hi all, We're running postgres 9.3.2, server configuration below. Seemingly randomly, we will see the number of active queries in postgres go up until we hit max_connections. The DB will recover after a few minutes. We had the issue a couple of times in Feb 2014. We then upgraded the postgres server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to several times a day. The user CPU goes up as well to a 100%, no increase in I/O or system CPU. We have slow query logging, and there is no dramatic change in the slow queries either. I think you might need to change your cutoff for what makes a slow query. It must be the case that either your are seeing an abnormal spike in query requests, or that the queries are taking an abnormally long time (or both). If not many queries are hitting log_min_duration_statement, that just means you can lower it further without causing too much log bloat. Usually when people hit max_connections under load, it means they need to move the limit into a connection pooler (where it can be more graceful, by putting them in a queue for a free slot, rather than returning an error) and *lower* max_connections. max_connections|1500|configuration file That is very high. Cheers, Jeff
Re: [GENERAL] Increase in max_connections
No errors in the logs, except when we hit max_connections No shared memory problems – no associated spike in I/O or system CPU indicating shared memory is either unused or over used. Sufficient memory in cache/buffers, zero swapping or anything indicative of a memory problem. The box is pretty beefy – 24 core, 768G RAM :) - so yes, an effective cache of 568GB is normal, we arrived at it with months of tuning over time. cpu_tuple_cost of 0.03 – yes, a lot of our settings are tweaked from the defaults based on performance. I don't have the output now, the the 0.03 was based on recommendations from posrgtes user groups, and via testing with setting it up and running explain analyze on queries. None of the settings have changed when this problem began. Thanks, Karthik From: Venkata Balaji Nagothi mailto:vbn...@gmail.com>> Date: Monday, March 10, 2014 7:35 PM To: "Anand Kumar, Karthik" mailto:karthik.anandku...@classmates.com>> Cc: "pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>" mailto:pgsql-general@postgresql.org>> Subject: Re: [GENERAL] Increase in max_connections On Tue, Mar 11, 2014 at 12:04 PM, Anand Kumar, Karthik mailto:karthik.anandku...@classmates.com>> wrote: Hi all, We're running postgres 9.3.2, server configuration below. Seemingly randomly, we will see the number of active queries in postgres go up until we hit max_connections. The DB will recover after a few minutes. We had the issue a couple of times in Feb 2014. We then upgraded the postgres server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to several times a day. The user CPU goes up as well to a 100%, no increase in I/O or system CPU. We have slow query logging, and there is no dramatic change in the slow queries either. There is a corresponding spike in shared locks, but that seems to be an effect not a cause – it corresponds to an increase in the number of running processes at the time. We had a similar issue in the past – that was solved by disabling transparent_huge_pages – but the difference there was that we'd see queries slow down dramatically. Currently, we don't. Also, transparent_huge_pages is still disabled. I do realize the issue would be caused by a spurt in incoming connections – we do not yet have conclusive evidence on whether that's happening (active queries climbs up, however no conclusive proof on whether thats because of slow down, or because of increase in traffic). Working on getting the information, will update with that information as soon as we have it. I thought I'd send a post out to the group before then, to see if anyone has run into anything similar. Thanks, Karthik site=# SELECT version(); PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit site=# SELECT name, current_setting(name), source site-# FROM pg_settings site-# WHERE source NOT IN ('default', 'override'); application_name|psql|client archive_command|/usr/bin/archiver.sh %f %p|configuration file archive_mode|on|configuration file autovacuum_freeze_max_age|25000|configuration file autovacuum_max_workers|6|configuration file bgwriter_lru_maxpages|1000|configuration file bgwriter_lru_multiplier|4|configuration file checkpoint_completion_target|0.8|configuration file checkpoint_segments|250|configuration file checkpoint_timeout|15min|configuration file checkpoint_warning|6min|configuration file client_encoding|UTF8|client commit_siblings|25|configuration file cpu_tuple_cost|0.03|configuration file DateStyle|ISO, MDY|configuration file default_statistics_target|300|configuration file default_text_search_config|pg_catalog.english|configuration file effective_cache_size|568GB|configuration file fsync|on|configuration file lc_messages|en_US.UTF-8|configuration file lc_monetary|en_US.UTF-8|configuration file lc_numeric|en_US.UTF-8|configuration file lc_time|en_US.UTF-8|configuration file listen_addresses|*|configuration file log_autovacuum_min_duration|0|configuration file log_checkpoints|on|configuration file log_connections|on|configuration file log_destination|syslog|configuration file log_directory|pg_log|configuration file log_filename|postgresql-%a.log|configuration file log_line_prefix|user=%u,db=%d,ip=%h |configuration file log_min_duration_statement|100ms|configuration file log_min_messages|debug1|configuration file log_rotation_age|1d|configuration file log_rotation_size|0|configuration file log_timezone|US/Pacific|configuration file log_truncate_on_rotation|on|configuration file logging_collector|off|configuration file maintenance_work_mem|1GB|configuration file max_connections|1500|configuration file max_locks_per_transaction|1000|configuration file max_stack_depth|2MB|environment variable max_wal_senders|5|configuration file port|5432|command line random_page_cost|2|configuration file shared_buffers|8
Re: [GENERAL] Increase in max_connections
On Mon, Mar 10, 2014 at 6:04 PM, Anand Kumar, Karthik < karthik.anandku...@classmates.com> wrote: > Hi all, > > We're running postgres 9.3.2, server configuration below. > > Seemingly randomly, we will see the number of active queries in postgres > go up until we hit max_connections. The DB will recover after a few minutes. > > We had the issue a couple of times in Feb 2014. We then upgraded the > postgres server from 9.1 to 9.3.2, and the occurrence has gone up > significantly - to several times a day. > > The user CPU goes up as well to a 100%, no increase in I/O or system CPU. > We have slow query logging, and there is no dramatic change in the slow > queries either. > I think you might need to change your cutoff for what makes a slow query. It must be the case that either your are seeing an abnormal spike in query requests, or that the queries are taking an abnormally long time (or both). If not many queries are hitting log_min_duration_statement, that just means you can lower it further without causing too much log bloat. Usually when people hit max_connections under load, it means they need to move the limit into a connection pooler (where it can be more graceful, by putting them in a queue for a free slot, rather than returning an error) and *lower* max_connections. max_connections|1500|configuration file > That is very high. Cheers, Jeff
Re: [GENERAL] Increase in max_connections
From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Anand Kumar, Karthik Sent: Monday, March 10, 2014 9:04 PM To: pgsql-general@postgresql.org Subject: [GENERAL] Increase in max_connections Hi all, We're running postgres 9.3.2, server configuration below. Seemingly randomly, we will see the number of active queries in postgres go up until we hit max_connections. The DB will recover after a few minutes. We had the issue a couple of times in Feb 2014. We then upgraded the postgres server from 9.1 to 9.3.2, and the occurrence has gone up significantly - to several times a day. The user CPU goes up as well to a 100%, no increase in I/O or system CPU. We have slow query logging, and there is no dramatic change in the slow queries either. There is a corresponding spike in shared locks, but that seems to be an effect not a cause - it corresponds to an increase in the number of running processes at the time. We had a similar issue in the past - that was solved by disabling transparent_huge_pages - but the difference there was that we'd see queries slow down dramatically. Currently, we don't. Also, transparent_huge_pages is still disabled. I do realize the issue would be caused by a spurt in incoming connections - we do not yet have conclusive evidence on whether that's happening (active queries climbs up, however no conclusive proof on whether thats because of slow down, or because of increase in traffic). Working on getting the information, will update with that information as soon as we have it. I thought I'd send a post out to the group before then, to see if anyone has run into anything similar. Thanks, Karthik site=# SELECT version(); PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit site=# SELECT name, current_setting(name), source site-# FROM pg_settings site-# WHERE source NOT IN ('default', 'override'); application_name|psql|client archive_command|/usr/bin/archiver.sh %f %p|configuration file archive_mode|on|configuration file autovacuum_freeze_max_age|25000|configuration file autovacuum_max_workers|6|configuration file bgwriter_lru_maxpages|1000|configuration file bgwriter_lru_multiplier|4|configuration file checkpoint_completion_target|0.8|configuration file checkpoint_segments|250|configuration file checkpoint_timeout|15min|configuration file checkpoint_warning|6min|configuration file client_encoding|UTF8|client commit_siblings|25|configuration file cpu_tuple_cost|0.03|configuration file DateStyle|ISO, MDY|configuration file default_statistics_target|300|configuration file default_text_search_config|pg_catalog.english|configuration file effective_cache_size|568GB|configuration file fsync|on|configuration file lc_messages|en_US.UTF-8|configuration file lc_monetary|en_US.UTF-8|configuration file lc_numeric|en_US.UTF-8|configuration file lc_time|en_US.UTF-8|configuration file listen_addresses|*|configuration file log_autovacuum_min_duration|0|configuration file log_checkpoints|on|configuration file log_connections|on|configuration file log_destination|syslog|configuration file log_directory|pg_log|configuration file log_filename|postgresql-%a.log|configuration file log_line_prefix|user=%u,db=%d,ip=%h |configuration file log_min_duration_statement|100ms|configuration file log_min_messages|debug1|configuration file log_rotation_age|1d|configuration file log_rotation_size|0|configuration file log_timezone|US/Pacific|configuration file log_truncate_on_rotation|on|configuration file logging_collector|off|configuration file maintenance_work_mem|1GB|configuration file max_connections|1500|configuration file max_locks_per_transaction|1000|configuration file max_stack_depth|2MB|environment variable max_wal_senders|5|configuration file port|5432|command line random_page_cost|2|configuration file shared_buffers|8GB|configuration file synchronous_commit|off|configuration file syslog_facility|local0|configuration file syslog_ident|postgres|configuration file TimeZone|US/Pacific|configuration file vacuum_freeze_table_age|0|configuration file wal_buffers|32MB|configuration file wal_keep_segments|250|configuration file wal_level|hot_standby|configuration file wal_sync_method|fsync|configuration file work_mem|130MB|configuration file You don't specify how many CPU cores you have, but I'm pretty sure there is not enough to support this: max_connections|1500|configuration file Try connection pooler, it should help. The simplest to install and configure would be PgBouncer, and it does the job very well. Regards, Igor Neyman -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Increase in max_connections
On Tue, Mar 11, 2014 at 12:04 PM, Anand Kumar, Karthik < karthik.anandku...@classmates.com> wrote: > Hi all, > > We're running postgres 9.3.2, server configuration below. > > Seemingly randomly, we will see the number of active queries in postgres > go up until we hit max_connections. The DB will recover after a few minutes. > > We had the issue a couple of times in Feb 2014. We then upgraded the > postgres server from 9.1 to 9.3.2, and the occurrence has gone up > significantly - to several times a day. > > The user CPU goes up as well to a 100%, no increase in I/O or system CPU. > We have slow query logging, and there is no dramatic change in the slow > queries either. > There is a corresponding spike in shared locks, but that seems to be an > effect not a cause - it corresponds to an increase in the number of running > processes at the time. > > We had a similar issue in the past - that was solved by disabling > transparent_huge_pages - but the difference there was that we'd see queries > slow down dramatically. Currently, we don't. Also, transparent_huge_pages > is still disabled. > > I do realize the issue would be caused by a spurt in incoming > connections - we do not yet have conclusive evidence on whether that's > happening (active queries climbs up, however no conclusive proof on whether > thats because of slow down, or because of increase in traffic). Working on > getting the information, will update with that information as soon as we > have it. > > I thought I'd send a post out to the group before then, to see if anyone > has run into anything similar. > > Thanks, > Karthik > > site=# SELECT version(); > PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 > 20120305 (Red Hat 4.4.6-4), 64-bit > > site=# SELECT name, current_setting(name), source > site-# FROM pg_settings > site-# WHERE source NOT IN ('default', 'override'); > application_name|psql|client > archive_command|/usr/bin/archiver.sh %f %p|configuration file > archive_mode|on|configuration file > autovacuum_freeze_max_age|25000|configuration file > autovacuum_max_workers|6|configuration file > bgwriter_lru_maxpages|1000|configuration file > bgwriter_lru_multiplier|4|configuration file > checkpoint_completion_target|0.8|configuration file > checkpoint_segments|250|configuration file > checkpoint_timeout|15min|configuration file > checkpoint_warning|6min|configuration file > client_encoding|UTF8|client > commit_siblings|25|configuration file > cpu_tuple_cost|0.03|configuration file > DateStyle|ISO, MDY|configuration file > default_statistics_target|300|configuration file > default_text_search_config|pg_catalog.english|configuration file > effective_cache_size|568GB|configuration file > fsync|on|configuration file > lc_messages|en_US.UTF-8|configuration file > lc_monetary|en_US.UTF-8|configuration file > lc_numeric|en_US.UTF-8|configuration file > lc_time|en_US.UTF-8|configuration file > listen_addresses|*|configuration file > log_autovacuum_min_duration|0|configuration file > log_checkpoints|on|configuration file > log_connections|on|configuration file > log_destination|syslog|configuration file > log_directory|pg_log|configuration file > log_filename|postgresql-%a.log|configuration file > log_line_prefix|user=%u,db=%d,ip=%h |configuration file > log_min_duration_statement|100ms|configuration file > log_min_messages|debug1|configuration file > log_rotation_age|1d|configuration file > log_rotation_size|0|configuration file > log_timezone|US/Pacific|configuration file > log_truncate_on_rotation|on|configuration file > logging_collector|off|configuration file > maintenance_work_mem|1GB|configuration file > max_connections|1500|configuration file > max_locks_per_transaction|1000|configuration file > max_stack_depth|2MB|environment variable > max_wal_senders|5|configuration file > port|5432|command line > random_page_cost|2|configuration file > shared_buffers|8GB|configuration file > synchronous_commit|off|configuration file > syslog_facility|local0|configuration file > syslog_ident|postgres|configuration file > TimeZone|US/Pacific|configuration file > vacuum_freeze_table_age|0|configuration file > wal_buffers|32MB|configuration file > wal_keep_segments|250|configuration file > wal_level|hot_standby|configuration file > wal_sync_method|fsync|configuration file > work_mem|130MB|configuration file > Please let us know your hardware configuration like RAM, CPU (cores) etc. Do you see any messages indicating any processes getting terminated/killed forcibly in the Postgresql logs ? Or do you see any shared memory related error messages ? cpu_tuple_cost=0.03 - which is not default, any reasons for increasing this. effective_cache_size = 568 GB - Please help us know if this is optimal for your system. Venkata Balaji N Sr. Database Administrator Fujitsu Australia
[GENERAL] Increase in max_connections
Hi all, We're running postgres 9.3.2, server configuration below. Seemingly randomly, we will see the number of active queries in postgres go up until we hit max_connections. The DB will recover after a few minutes. We had the issue a couple of times in Feb 2014. We then upgraded the postgres server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to several times a day. The user CPU goes up as well to a 100%, no increase in I/O or system CPU. We have slow query logging, and there is no dramatic change in the slow queries either. There is a corresponding spike in shared locks, but that seems to be an effect not a cause – it corresponds to an increase in the number of running processes at the time. We had a similar issue in the past – that was solved by disabling transparent_huge_pages – but the difference there was that we'd see queries slow down dramatically. Currently, we don't. Also, transparent_huge_pages is still disabled. I do realize the issue would be caused by a spurt in incoming connections – we do not yet have conclusive evidence on whether that's happening (active queries climbs up, however no conclusive proof on whether thats because of slow down, or because of increase in traffic). Working on getting the information, will update with that information as soon as we have it. I thought I'd send a post out to the group before then, to see if anyone has run into anything similar. Thanks, Karthik site=# SELECT version(); PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit site=# SELECT name, current_setting(name), source site-# FROM pg_settings site-# WHERE source NOT IN ('default', 'override'); application_name|psql|client archive_command|/usr/bin/archiver.sh %f %p|configuration file archive_mode|on|configuration file autovacuum_freeze_max_age|25000|configuration file autovacuum_max_workers|6|configuration file bgwriter_lru_maxpages|1000|configuration file bgwriter_lru_multiplier|4|configuration file checkpoint_completion_target|0.8|configuration file checkpoint_segments|250|configuration file checkpoint_timeout|15min|configuration file checkpoint_warning|6min|configuration file client_encoding|UTF8|client commit_siblings|25|configuration file cpu_tuple_cost|0.03|configuration file DateStyle|ISO, MDY|configuration file default_statistics_target|300|configuration file default_text_search_config|pg_catalog.english|configuration file effective_cache_size|568GB|configuration file fsync|on|configuration file lc_messages|en_US.UTF-8|configuration file lc_monetary|en_US.UTF-8|configuration file lc_numeric|en_US.UTF-8|configuration file lc_time|en_US.UTF-8|configuration file listen_addresses|*|configuration file log_autovacuum_min_duration|0|configuration file log_checkpoints|on|configuration file log_connections|on|configuration file log_destination|syslog|configuration file log_directory|pg_log|configuration file log_filename|postgresql-%a.log|configuration file log_line_prefix|user=%u,db=%d,ip=%h |configuration file log_min_duration_statement|100ms|configuration file log_min_messages|debug1|configuration file log_rotation_age|1d|configuration file log_rotation_size|0|configuration file log_timezone|US/Pacific|configuration file log_truncate_on_rotation|on|configuration file logging_collector|off|configuration file maintenance_work_mem|1GB|configuration file max_connections|1500|configuration file max_locks_per_transaction|1000|configuration file max_stack_depth|2MB|environment variable max_wal_senders|5|configuration file port|5432|command line random_page_cost|2|configuration file shared_buffers|8GB|configuration file synchronous_commit|off|configuration file syslog_facility|local0|configuration file syslog_ident|postgres|configuration file TimeZone|US/Pacific|configuration file vacuum_freeze_table_age|0|configuration file wal_buffers|32MB|configuration file wal_keep_segments|250|configuration file wal_level|hot_standby|configuration file wal_sync_method|fsync|configuration file work_mem|130MB|configuration file