subject:"\[GENERAL\] Increase in max

Re: [GENERAL] Increase in max_connections

2014-04-08 Thread Kevin Grittner

Kevin Grittner  wrote:

> Please post the output of this:
> 
> numactl --hardware

Oh, it would also help in making specific suggestions if you could
show the output of:

mount | grep cpuset

... and a listing of "file names" in the mounted directory.  There
is some variation among distros in both where this directory is and
what names are used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increase in max_connections

2014-04-07 Thread Kevin Grittner

"Anand Kumar, Karthik"  wrote:

> We finally made some headway on this - we noticed messages like
> the below

> in /var/log/messages whenever the issue happened:
>
> Mar 26 07:39:58 site-db01b kernel: postmaster: page allocation failure.

> Anyone have any idea why memory was so fragmented, and what
> causes memory to be defragged? Is it something postgres does? Are
> there any kernel specific settings that control it?

While I agree with other replies that you would benefit from
transaction-based connection pooling, there may be another issue at
play here.  I can't be sure from evidence so far, but this might be
related to something I've been looking at related to NUMA memory
and how the OS buffers and PostgreSQL shared_buffers interact with
it.  Most of the available benefit can be realized without any
change to the PostgreSQL code by using the cpuset features of the
OS.  If you want to investigate this, the first thing would be to
get a look at the shape of things.  Please post the output of this:

numactl --hardware

The above just reports on the hardware -- it doesn't change
anything.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increase in max_connections

2014-03-26 Thread Anand Kumar, Karthik

Thanks Bruce. Really interesting, but, I show zone reclaim is already
turned off on our system.

root@site-db01b:~ # numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 393181 MB
node 0 free: 467 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 393215 MB
node 1 free: 319 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

root@site-db01b:~ # cat /proc/sys/vm/zone_reclaim_mode
0



Thanks,
Karthik




On 3/26/14 1:54 PM, "Bruce Momjian"  wrote:

>On Wed, Mar 26, 2014 at 08:22:01PM +, Anand Kumar, Karthik wrote:
>> Looking a little deeper, I saw signs of memory being heavily fragmented:
>> 
>> root@site-db01b:/var/log # cat /proc/buddyinfo
>> Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3
>> Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92
>> Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1
>> Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0
>> 
>> 
>> Node 0 has 13069 4k blocks, and zero 8k blocks available to use
>> Which is likely what caused the problem, I'd think.
>> 
>> A little while later though, buddyinfo changed and suddenly there was a
>> lot more memory in 8k blocks.
>> 
>> root@site-db01b:/proc # cat /proc/buddyinfo
>> Node 0, zone  DMA  1  1  2  2  2  1  0
>>  0  1  1  3
>> Node 0, zoneDMA32  8  7  8  7 10  8  7
>> 11  9  5 92
>> Node 0, zone   Normal   9645   5495   1115  0  0  0  0
>>  0  0  0  1
>> Node 1, zone   Normal 409734  10953  1  0  1  1  0
>>  1  1  1  0
>> 
>> (Note the change in the Node 0 line, 8k blocks went up from 0 to 5495)
>> 
>> Anyone have any idea why memory was so fragmented, and what causes
>>memory
>> to be defragged? Is it something postgres does? Are there any kernel
>> specific settings that control it?
>
>If I had to take a guess, it is zone_reclaim;  see:
>
>   
> http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-recl
>aim-mode.html
>
>The fix is this sysctl:
>
>   vm.zone_reclaim_mode = 0
>
>-- 
>  Bruce Momjian  http://momjian.us
>  EnterpriseDB http://enterprisedb.com
>
>  + Everyone has their own god. +



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increase in max_connections

2014-03-26 Thread Bruce Momjian

On Wed, Mar 26, 2014 at 08:22:01PM +, Anand Kumar, Karthik wrote:
> Looking a little deeper, I saw signs of memory being heavily fragmented:
> 
> root@site-db01b:/var/log # cat /proc/buddyinfo
> Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3
> Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92
> Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1
> Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0
> 
> 
> Node 0 has 13069 4k blocks, and zero 8k blocks available to use
> Which is likely what caused the problem, I'd think.
> 
> A little while later though, buddyinfo changed and suddenly there was a
> lot more memory in 8k blocks.
> 
> root@site-db01b:/proc # cat /proc/buddyinfo
> Node 0, zone  DMA  1  1  2  2  2  1  0
>  0  1  1  3
> Node 0, zoneDMA32  8  7  8  7 10  8  7
> 11  9  5 92
> Node 0, zone   Normal   9645   5495   1115  0  0  0  0
>  0  0  0  1
> Node 1, zone   Normal 409734  10953  1  0  1  1  0
>  1  1  1  0
> 
> (Note the change in the Node 0 line, 8k blocks went up from 0 to 5495)
> 
> Anyone have any idea why memory was so fragmented, and what causes memory
> to be defragged? Is it something postgres does? Are there any kernel
> specific settings that control it?

If I had to take a guess, it is zone_reclaim;  see:


http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html

The fix is this sysctl:

vm.zone_reclaim_mode = 0

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increase in max_connections

2014-03-26 Thread Anand Kumar, Karthik

Hi all,

We finally made some headway on this - we noticed messages like the below
in /var/log/messages whenever the issue happened:

Mar 26 07:39:58 site-db01b kernel: postmaster: page allocation failure.
order:1, mode:0x20
Mar 26 07:39:58 site-db01b kernel: Pid: 39066, comm: postmaster Not
tainted 2.6.32-279.el6.x86_64 #1
Mar 26 07:39:58 site-db01b kernel: Call Trace:
Mar 26 07:39:58 site-db01b kernel:   [] ?
__alloc_pages_nodemask+0x77f/0x940
Mar 26 07:39:58 site-db01b kernel: [] ?
fallback_alloc+0x1ba/0x270
Mar 26 07:39:58 site-db01b kernel: [] ?
kmem_getpages+0x62/0x170
Mar 26 07:39:58 site-db01b kernel: [] ?
cache_grow+0x2cf/0x320
Mar 26 07:39:58 site-db01b kernel: [] ?
cache_alloc_node+0x99/0x160
Mar 26 07:39:58 site-db01b kernel: [] ?
kmem_cache_alloc+0x11b/0x190
Mar 26 07:39:58 site-db01b kernel: [] ?
sk_clone+0x22/0x2e0
Mar 26 07:39:58 site-db01b kernel: [] ?
sk_prot_alloc+0x48/0x1c0
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_create_openreq_child+0x23/0x450
Mar 26 07:39:58 site-db01b kernel: [] ?
inet_csk_clone+0x16/0xd0
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_v4_rcv+0x4fe/0x8d0
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_local_deliver_finish+0x0/0x2d0
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_local_deliver_finish+0xdd/0x2d0
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_v4_syn_recv_sock+0x4d/0x310
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_check_req+0x226/0x460
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_v4_do_rcv+0x35b/0x430
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_rcv_established+0x38d/0x800
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_rcv+0x275/0x350
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_rcv_finish+0x12d/0x440
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_local_deliver+0x98/0xa0
Mar 26 07:39:58 site-db01b kernel: [] ?
__netif_receive_skb+0x49b/0x6f0
Mar 26 07:39:58 site-db01b kernel: [] ?
tg3_poll_work+0x654/0xe30 [tg3]
Mar 26 07:39:58 site-db01b kernel: [] ?
tg3_poll_msix+0x4c/0x150 [tg3]
Mar 26 07:39:58 site-db01b kernel: [] ?
netif_receive_skb+0x58/0x60
Mar 26 07:39:58 site-db01b kernel: [] ?
__do_softirq+0xc1/0x1e0
Mar 26 07:39:58 site-db01b kernel: [] ?
napi_skb_finish+0x50/0x70
Mar 26 07:39:58 site-db01b kernel: [] ?
net_rx_action+0x103/0x2f0
Mar 26 07:39:58 site-db01b kernel: [] ?
napi_gro_receive+0x39/0x50
Mar 26 07:39:58 site-db01b kernel: [] ?
handle_IRQ_event+0x60/0x170
Mar 26 07:39:58 site-db01b kernel: [] ?
irq_exit+0x85/0x90
Mar 26 07:39:58 site-db01b kernel: [] ?
do_softirq+0x65/0xa0
Mar 26 07:39:58 site-db01b kernel: [] ?
__do_softirq+0x11f/0x1e0
Mar 26 07:39:58 site-db01b kernel: [] ?
call_softirq+0x1c/0x30

Doing some digging on that, we disabled TSO/TRO, GSO/GRO at the tcp layer
- and that seems to have helped.

$ sudo ethtool -k eth0 Offload parameters for
eth0:
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off



However, I'm looking for more information on what's happening: That stack
trace above seems to indicate that it was unable to allocate 2*4k pages
(8k) to the network stack. Its likely that was needed for GSO/GRO.
However, wondering why the kernel is unable to allocate just 8k - we have
a 768G RAM server, with over 54G in buffers/cache

root@site-db01b:/proc # free -m
 total   used   free sharedbuffers cached
Mem:775382 773354   2028  0   1403 738735
-/+ buffers/cache:  33215 742166
Swap:0  0  0


Looking a little deeper, I saw signs of memory being heavily fragmented:

root@site-db01b:/var/log # cat /proc/buddyinfo
Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3
Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92
Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1
Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0


Node 0 has 13069 4k blocks, and zero 8k blocks available to use
Which is likely what caused the problem, I'd think.

A little while later though, buddyinfo changed and suddenly there was a
lot more memory in 8k blocks.

root@site-db01b:/proc # cat /proc/buddyinfo
Node 0, zone  DMA  1  1  2  2  2  1  0
 0  1  1  3
Node 0, zoneDMA32  8  7  8  7 10  8  7
11  9  5 92
Node 0, zone   Normal   9645   5495   1115  0  0  0  0
 0  0  0  1
Node 1, zone   Normal 409734  10953  1  0  1  1  0
 1  1  1  0

(Note the change in the Node 0 line, 8k blocks went up from 0 to 5495)

Anyone have any idea why memory was so fragmented, and what causes memory
to be defragged? Is it something postgres does? Are there any kernel
specific settings that control it?


Thanks,
Karthik




On 3/14/14 3:37 PM, "Anand Kumar, Karthik"
 wrote:

>For anyone that's still following - we tried upgrading to postgres 9.3.3 -
>that hasn't helped.
>
>Running an strace on the pid that was consuming the h

Re: [GENERAL] Increase in max_connections

2014-03-14 Thread Anand Kumar, Karthik

For anyone that's still following - we tried upgrading to postgres 9.3.3 -
that hasn't helped.

Running an strace on the pid that was consuming the highest CPU at the
time of the outage shows:

semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91881569, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(89325587, {{14, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90964037, {{4, 1, 0}}, 1) = 0
semop(90308657, {{5, 1, 0}}, 1) = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(88866821, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90439733, {{13, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90341426, {{2, 1, 0}}, 1) = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90308657, {{5, 1, 0}}, 1) = 0
semop(91881569, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(88866821, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91881569, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90865730, {{5, 1, 0}}, 1) = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90865730, {{5, 1, 0}}, 1) = 0

I've seen other people talk of this problem with a lot of semop calls,
haven't yet found a clear solution.
Anyone have any ideas?

I've also downloaded the perf tool based on
http://rhaas.blogspot.com/2012/06/perf-good-bad-ugly.html - will see what
that has to show.

Thanks,
Karthik


On 3/11/14 1:06 PM, "John R Pierce"  wrote:

>On 3/11/2014 10:20 AM, Anand Kumar, Karthik wrote:
>> We typically see about 500-700 active queries at a time
>
>if these are primarily small/fast queries, like OLTP operations, and you
>DONT have 200-400 CPU cores on this server, you will likely find that if
>you use a queueing mechanism to only execute about 2X your CPU core
>count concurrently, you will get MORE total transactions/second than
>trying to do 500-700 at once.
>
>if your apps are using persistent connections, then the session pooling
>model won't do any good, you should use transaction pooling.  you want
>the actual active query count to be tunable, probably down around 2X the
>cpu core count, depending on various things.some folks say, CPU
>cores/threads plus disk spindles is the optimal number.
>
>
>
>-- 
>john r pierce  37N 122W
>somewhere on the middle of the left coast
>
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increase in max_connections

2014-03-11 Thread John R Pierce


On 3/11/2014 10:20 AM, Anand Kumar, Karthik wrote:
We typically see about 500-700 active queries at a time 


if these are primarily small/fast queries, like OLTP operations, and you 
DONT have 200-400 CPU cores on this server, you will likely find that if 
you use a queueing mechanism to only execute about 2X your CPU core 
count concurrently, you will get MORE total transactions/second than 
trying to do 500-700 at once.


if your apps are using persistent connections, then the session pooling 
model won't do any good, you should use transaction pooling.  you want 
the actual active query count to be tunable, probably down around 2X the 
cpu core count, depending on various things.some folks say, CPU 
cores/threads plus disk spindles is the optimal number.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increase in max_connections

2014-03-11 Thread Jeff Janes

On Tue, Mar 11, 2014 at 10:20 AM, Anand Kumar, Karthik <
karthik.anandku...@classmates.com> wrote:

>   Thanks Jeff. We have scripts in place now to capture the incoming rate
> of requests. Waiting on the crash to happen to see if it spikes up :)
>
>  Re: min_log_duration - we *do* see a good number of requests in the log
> that hit our cap (of 100ms). Just that nothing stands out when we have the
> issue. Whatever queries we do see slow down seem to be after we start the
> CPU spike, and so an effect and not a cause.
>

I think what you have is a vicious cycle: too many active connections leads
to contention which leads to slow response which leads to piling up
connections which leads to more contention.  So the cause and the effect
are the same thing as each other, you can't cleanly divide them.

>
>  We typically see about 500-700 active queries at a time - and that seems
> to match how high connection limit goes.
>

This is during normal times, or during the trouble?

> We tried pg_bouncer, however, at session level pooling, it slowed down our
> applications (they maintain persistent connections once established, so any
> connection overhead slows them down),
>

I don't understand that.  If the connections are persistent, why would they
increase during the slow down?

Cheers,

Jeff

Re: [GENERAL] Increase in max_connections

2014-03-11 Thread Anand Kumar, Karthik

Thanks Jeff. We have scripts in place now to capture the incoming rate of 
requests. Waiting on the crash to happen to see if it spikes up :)

Re: min_log_duration – we *do* see a good number of requests in the log that 
hit our cap (of 100ms). Just that nothing stands out when we have the issue. 
Whatever queries we do see slow down seem to be after we start the CPU spike, 
and so an effect and not a cause.

We typically see about 500-700 active queries at a time – and that seems to 
match how high connection limit goes. We tried pg_bouncer, however, at session 
level pooling, it slowed down our applications (they maintain persistent 
connections once established, so any connection overhead slows them down), and 
with transaction level pooling, simply did not work.

Thanks,
Karthik

From: Jeff Janes mailto:jeff.ja...@gmail.com>>
Date: Tuesday, March 11, 2014 9:23 AM
To: "Anand Kumar, Karthik" 
mailto:karthik.anandku...@classmates.com>>
Cc: "pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>" 
mailto:pgsql-general@postgresql.org>>
Subject: Re: [GENERAL] Increase in max_connections

On Mon, Mar 10, 2014 at 6:04 PM, Anand Kumar, Karthik 
mailto:karthik.anandku...@classmates.com>> 
wrote:
Hi all,

We're running postgres 9.3.2, server configuration below.

Seemingly randomly, we will see the number of active queries in postgres go up 
until we hit max_connections. The DB will recover after a few minutes.

We had the issue a couple of times in Feb 2014. We then upgraded the postgres 
server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to 
several times a day.

The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
We have slow query logging, and there is no dramatic change in the slow queries 
either.


I think you might need to change your cutoff for what makes a slow query.  It 
must be the case that either your are seeing an abnormal spike in query 
requests, or that the queries are taking an abnormally long time (or both).  If 
not many queries are hitting log_min_duration_statement, that just means you 
can lower it further without causing too much log bloat.

Usually when people hit max_connections under load, it means they need to move 
the limit into a connection pooler (where it can be more graceful, by putting 
them in a queue for a free slot, rather than returning an error) and *lower* 
max_connections.


max_connections|1500|configuration file

 That is very high.

Cheers,

Jeff

Re: [GENERAL] Increase in max_connections

2014-03-11 Thread Anand Kumar, Karthik

No errors in the logs, except when we hit max_connections
No shared memory problems – no associated spike in I/O or system CPU indicating 
shared memory is either unused or over used. Sufficient memory in 
cache/buffers, zero swapping or anything indicative of a memory problem.

The box is pretty beefy – 24 core, 768G RAM :) - so yes, an effective cache of 
568GB is normal, we arrived at it with months of tuning over time.

cpu_tuple_cost of 0.03 – yes, a lot of our settings are tweaked from the 
defaults based on performance. I don't have the output now, the the 0.03 was 
based on recommendations from posrgtes user groups, and via testing with 
setting it up and running explain analyze on queries. None of the settings have 
changed when this problem began.

Thanks,
Karthik

From: Venkata Balaji Nagothi mailto:vbn...@gmail.com>>
Date: Monday, March 10, 2014 7:35 PM
To: "Anand Kumar, Karthik" 
mailto:karthik.anandku...@classmates.com>>
Cc: "pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>" 
mailto:pgsql-general@postgresql.org>>
Subject: Re: [GENERAL] Increase in max_connections


On Tue, Mar 11, 2014 at 12:04 PM, Anand Kumar, Karthik 
mailto:karthik.anandku...@classmates.com>> 
wrote:
Hi all,

We're running postgres 9.3.2, server configuration below.

Seemingly randomly, we will see the number of active queries in postgres go up 
until we hit max_connections. The DB will recover after a few minutes.

We had the issue a couple of times in Feb 2014. We then upgraded the postgres 
server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to 
several times a day.

The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
We have slow query logging, and there is no dramatic change in the slow queries 
either.
There is a corresponding spike in shared locks, but that seems to be an effect 
not a cause – it corresponds to an increase in the number of running processes 
at the time.

We had a similar issue in the past – that was solved by disabling 
transparent_huge_pages – but the difference there was that we'd see queries 
slow down dramatically. Currently, we don't. Also, transparent_huge_pages is 
still disabled.

I do realize the issue would be caused by a spurt in incoming connections – we 
do not yet have conclusive evidence on whether that's happening (active queries 
climbs up, however no conclusive proof on whether thats because of slow down, 
or because of increase in traffic). Working on getting the information, will 
update with that information as soon as we have it.

I thought I'd send a post out to the group before then, to see if anyone has 
run into anything similar.

Thanks,
Karthik

site=# SELECT version();
PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 
20120305 (Red Hat 4.4.6-4), 64-bit

site=# SELECT name, current_setting(name), source
site-#   FROM pg_settings
site-#   WHERE source NOT IN ('default', 'override');
application_name|psql|client
archive_command|/usr/bin/archiver.sh %f %p|configuration file
archive_mode|on|configuration file
autovacuum_freeze_max_age|25000|configuration file
autovacuum_max_workers|6|configuration file
bgwriter_lru_maxpages|1000|configuration file
bgwriter_lru_multiplier|4|configuration file
checkpoint_completion_target|0.8|configuration file
checkpoint_segments|250|configuration file
checkpoint_timeout|15min|configuration file
checkpoint_warning|6min|configuration file
client_encoding|UTF8|client
commit_siblings|25|configuration file
cpu_tuple_cost|0.03|configuration file
DateStyle|ISO, MDY|configuration file
default_statistics_target|300|configuration file
default_text_search_config|pg_catalog.english|configuration file
effective_cache_size|568GB|configuration file
fsync|on|configuration file
lc_messages|en_US.UTF-8|configuration file
lc_monetary|en_US.UTF-8|configuration file
lc_numeric|en_US.UTF-8|configuration file
lc_time|en_US.UTF-8|configuration file
listen_addresses|*|configuration file
log_autovacuum_min_duration|0|configuration file
log_checkpoints|on|configuration file
log_connections|on|configuration file
log_destination|syslog|configuration file
log_directory|pg_log|configuration file
log_filename|postgresql-%a.log|configuration file
log_line_prefix|user=%u,db=%d,ip=%h |configuration file
log_min_duration_statement|100ms|configuration file
log_min_messages|debug1|configuration file
log_rotation_age|1d|configuration file
log_rotation_size|0|configuration file
log_timezone|US/Pacific|configuration file
log_truncate_on_rotation|on|configuration file
logging_collector|off|configuration file
maintenance_work_mem|1GB|configuration file
max_connections|1500|configuration file
max_locks_per_transaction|1000|configuration file
max_stack_depth|2MB|environment variable
max_wal_senders|5|configuration file
port|5432|command line
random_page_cost|2|configuration file
shared_buffers|8

Re: [GENERAL] Increase in max_connections

2014-03-11 Thread Jeff Janes

On Mon, Mar 10, 2014 at 6:04 PM, Anand Kumar, Karthik <
karthik.anandku...@classmates.com> wrote:

>   Hi all,
>
>  We're running postgres 9.3.2, server configuration below.
>
>  Seemingly randomly, we will see the number of active queries in postgres
> go up until we hit max_connections. The DB will recover after a few minutes.
>
>  We had the issue a couple of times in Feb 2014. We then upgraded the
> postgres server from 9.1 to 9.3.2, and the occurrence has gone up
> significantly - to several times a day.
>
>  The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
> We have slow query logging, and there is no dramatic change in the slow
> queries either.
>

I think you might need to change your cutoff for what makes a slow query.
 It must be the case that either your are seeing an abnormal spike in query
requests, or that the queries are taking an abnormally long time (or both).
 If not many queries are hitting log_min_duration_statement, that just
means you can lower it further without causing too much log bloat.

Usually when people hit max_connections under load, it means they need to
move the limit into a connection pooler (where it can be more graceful, by
putting them in a queue for a free slot, rather than returning an error)
and *lower* max_connections.

max_connections|1500|configuration file
>

 That is very high.

Cheers,

Jeff

Re: [GENERAL] Increase in max_connections

2014-03-11 Thread Igor Neyman

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Anand Kumar, Karthik
Sent: Monday, March 10, 2014 9:04 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Increase in max_connections

Hi all,

We're running postgres 9.3.2, server configuration below. 

Seemingly randomly, we will see the number of active queries in postgres go up 
until we hit max_connections. The DB will recover after a few minutes.

We had the issue a couple of times in Feb 2014. We then upgraded the postgres 
server from 9.1 to 9.3.2, and the occurrence has gone up significantly - to 
several times a day. 

The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
We have slow query logging, and there is no dramatic change in the slow queries 
either. 
There is a corresponding spike in shared locks, but that seems to be an effect 
not a cause - it corresponds to an increase in the number of running processes 
at the time.

We had a similar issue in the past - that was solved by disabling 
transparent_huge_pages - but the difference there was that we'd see queries 
slow down dramatically. Currently, we don't. Also, transparent_huge_pages is 
still disabled.

I do realize the issue would be caused by a spurt in incoming connections - we 
do not yet have conclusive evidence on whether that's happening (active queries 
climbs up, however no conclusive proof on whether thats because of slow down, 
or because of increase in traffic). Working on getting the information, will 
update with that information as soon as we have it.

I thought I'd send a post out to the group before then, to see if anyone has 
run into anything similar.

Thanks,
Karthik

site=# SELECT version();
PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 
20120305 (Red Hat 4.4.6-4), 64-bit

site=# SELECT name, current_setting(name), source
site-#   FROM pg_settings
site-#   WHERE source NOT IN ('default', 'override');
application_name|psql|client
archive_command|/usr/bin/archiver.sh %f %p|configuration file
archive_mode|on|configuration file
autovacuum_freeze_max_age|25000|configuration file
autovacuum_max_workers|6|configuration file
bgwriter_lru_maxpages|1000|configuration file
bgwriter_lru_multiplier|4|configuration file
checkpoint_completion_target|0.8|configuration file
checkpoint_segments|250|configuration file
checkpoint_timeout|15min|configuration file
checkpoint_warning|6min|configuration file
client_encoding|UTF8|client
commit_siblings|25|configuration file
cpu_tuple_cost|0.03|configuration file
DateStyle|ISO, MDY|configuration file
default_statistics_target|300|configuration file
default_text_search_config|pg_catalog.english|configuration file
effective_cache_size|568GB|configuration file
fsync|on|configuration file
lc_messages|en_US.UTF-8|configuration file
lc_monetary|en_US.UTF-8|configuration file
lc_numeric|en_US.UTF-8|configuration file
lc_time|en_US.UTF-8|configuration file
listen_addresses|*|configuration file
log_autovacuum_min_duration|0|configuration file
log_checkpoints|on|configuration file
log_connections|on|configuration file
log_destination|syslog|configuration file
log_directory|pg_log|configuration file
log_filename|postgresql-%a.log|configuration file
log_line_prefix|user=%u,db=%d,ip=%h |configuration file
log_min_duration_statement|100ms|configuration file
log_min_messages|debug1|configuration file
log_rotation_age|1d|configuration file
log_rotation_size|0|configuration file
log_timezone|US/Pacific|configuration file
log_truncate_on_rotation|on|configuration file
logging_collector|off|configuration file
maintenance_work_mem|1GB|configuration file
max_connections|1500|configuration file
max_locks_per_transaction|1000|configuration file
max_stack_depth|2MB|environment variable
max_wal_senders|5|configuration file
port|5432|command line
random_page_cost|2|configuration file
shared_buffers|8GB|configuration file
synchronous_commit|off|configuration file
syslog_facility|local0|configuration file
syslog_ident|postgres|configuration file
TimeZone|US/Pacific|configuration file
vacuum_freeze_table_age|0|configuration file
wal_buffers|32MB|configuration file
wal_keep_segments|250|configuration file
wal_level|hot_standby|configuration file
wal_sync_method|fsync|configuration file
work_mem|130MB|configuration file

You don't specify how many CPU cores you have, but I'm pretty sure there is not 
enough to support this:

max_connections|1500|configuration file

Try connection pooler, it should help.
The simplest to install and configure would be PgBouncer, and it does the job 
very well.

Regards,
Igor Neyman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increase in max_connections

2014-03-10 Thread Venkata Balaji Nagothi

On Tue, Mar 11, 2014 at 12:04 PM, Anand Kumar, Karthik <
karthik.anandku...@classmates.com> wrote:

>   Hi all,
>
>  We're running postgres 9.3.2, server configuration below.
>
>  Seemingly randomly, we will see the number of active queries in postgres
> go up until we hit max_connections. The DB will recover after a few minutes.
>
>  We had the issue a couple of times in Feb 2014. We then upgraded the
> postgres server from 9.1 to 9.3.2, and the occurrence has gone up
> significantly - to several times a day.
>
>  The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
> We have slow query logging, and there is no dramatic change in the slow
> queries either.
> There is a corresponding spike in shared locks, but that seems to be an
> effect not a cause - it corresponds to an increase in the number of running
> processes at the time.
>
>  We had a similar issue in the past - that was solved by disabling
> transparent_huge_pages - but the difference there was that we'd see queries
> slow down dramatically. Currently, we don't. Also, transparent_huge_pages
> is still disabled.
>
>  I do realize the issue would be caused by a spurt in incoming
> connections - we do not yet have conclusive evidence on whether that's
> happening (active queries climbs up, however no conclusive proof on whether
> thats because of slow down, or because of increase in traffic). Working on
> getting the information, will update with that information as soon as we
> have it.
>
>  I thought I'd send a post out to the group before then, to see if anyone
> has run into anything similar.
>
>  Thanks,
> Karthik
>
>  site=# SELECT version();
> PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6
> 20120305 (Red Hat 4.4.6-4), 64-bit
>
>  site=# SELECT name, current_setting(name), source
> site-#   FROM pg_settings
> site-#   WHERE source NOT IN ('default', 'override');
> application_name|psql|client
> archive_command|/usr/bin/archiver.sh %f %p|configuration file
> archive_mode|on|configuration file
> autovacuum_freeze_max_age|25000|configuration file
> autovacuum_max_workers|6|configuration file
> bgwriter_lru_maxpages|1000|configuration file
> bgwriter_lru_multiplier|4|configuration file
> checkpoint_completion_target|0.8|configuration file
> checkpoint_segments|250|configuration file
> checkpoint_timeout|15min|configuration file
> checkpoint_warning|6min|configuration file
> client_encoding|UTF8|client
> commit_siblings|25|configuration file
> cpu_tuple_cost|0.03|configuration file
> DateStyle|ISO, MDY|configuration file
> default_statistics_target|300|configuration file
> default_text_search_config|pg_catalog.english|configuration file
> effective_cache_size|568GB|configuration file
> fsync|on|configuration file
> lc_messages|en_US.UTF-8|configuration file
> lc_monetary|en_US.UTF-8|configuration file
> lc_numeric|en_US.UTF-8|configuration file
> lc_time|en_US.UTF-8|configuration file
> listen_addresses|*|configuration file
> log_autovacuum_min_duration|0|configuration file
> log_checkpoints|on|configuration file
> log_connections|on|configuration file
> log_destination|syslog|configuration file
> log_directory|pg_log|configuration file
> log_filename|postgresql-%a.log|configuration file
> log_line_prefix|user=%u,db=%d,ip=%h |configuration file
> log_min_duration_statement|100ms|configuration file
> log_min_messages|debug1|configuration file
> log_rotation_age|1d|configuration file
> log_rotation_size|0|configuration file
> log_timezone|US/Pacific|configuration file
> log_truncate_on_rotation|on|configuration file
> logging_collector|off|configuration file
> maintenance_work_mem|1GB|configuration file
> max_connections|1500|configuration file
> max_locks_per_transaction|1000|configuration file
> max_stack_depth|2MB|environment variable
> max_wal_senders|5|configuration file
> port|5432|command line
> random_page_cost|2|configuration file
> shared_buffers|8GB|configuration file
> synchronous_commit|off|configuration file
> syslog_facility|local0|configuration file
> syslog_ident|postgres|configuration file
> TimeZone|US/Pacific|configuration file
> vacuum_freeze_table_age|0|configuration file
> wal_buffers|32MB|configuration file
> wal_keep_segments|250|configuration file
> wal_level|hot_standby|configuration file
> wal_sync_method|fsync|configuration file
> work_mem|130MB|configuration file
>

Please let us know your hardware configuration like RAM, CPU (cores) etc.

Do you see any messages indicating any processes getting terminated/killed
forcibly in the Postgresql logs ?

Or do you see any shared memory related error messages ?

cpu_tuple_cost=0.03 - which is not default, any reasons for increasing this.

effective_cache_size = 568 GB  -  Please help us know if this is optimal
for your system.


Venkata Balaji N

Sr. Database Administrator
Fujitsu Australia

[GENERAL] Increase in max_connections

2014-03-10 Thread Anand Kumar, Karthik

Hi all,

We're running postgres 9.3.2, server configuration below.

Seemingly randomly, we will see the number of active queries in postgres go up 
until we hit max_connections. The DB will recover after a few minutes.

We had the issue a couple of times in Feb 2014. We then upgraded the postgres 
server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to 
several times a day.

The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
We have slow query logging, and there is no dramatic change in the slow queries 
either.
There is a corresponding spike in shared locks, but that seems to be an effect 
not a cause – it corresponds to an increase in the number of running processes 
at the time.

We had a similar issue in the past – that was solved by disabling 
transparent_huge_pages – but the difference there was that we'd see queries 
slow down dramatically. Currently, we don't. Also, transparent_huge_pages is 
still disabled.

I do realize the issue would be caused by a spurt in incoming connections – we 
do not yet have conclusive evidence on whether that's happening (active queries 
climbs up, however no conclusive proof on whether thats because of slow down, 
or because of increase in traffic). Working on getting the information, will 
update with that information as soon as we have it.

I thought I'd send a post out to the group before then, to see if anyone has 
run into anything similar.

Thanks,
Karthik

site=# SELECT version();
PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 
20120305 (Red Hat 4.4.6-4), 64-bit

site=# SELECT name, current_setting(name), source
site-#   FROM pg_settings
site-#   WHERE source NOT IN ('default', 'override');
application_name|psql|client
archive_command|/usr/bin/archiver.sh %f %p|configuration file
archive_mode|on|configuration file
autovacuum_freeze_max_age|25000|configuration file
autovacuum_max_workers|6|configuration file
bgwriter_lru_maxpages|1000|configuration file
bgwriter_lru_multiplier|4|configuration file
checkpoint_completion_target|0.8|configuration file
checkpoint_segments|250|configuration file
checkpoint_timeout|15min|configuration file
checkpoint_warning|6min|configuration file
client_encoding|UTF8|client
commit_siblings|25|configuration file
cpu_tuple_cost|0.03|configuration file
DateStyle|ISO, MDY|configuration file
default_statistics_target|300|configuration file
default_text_search_config|pg_catalog.english|configuration file
effective_cache_size|568GB|configuration file
fsync|on|configuration file
lc_messages|en_US.UTF-8|configuration file
lc_monetary|en_US.UTF-8|configuration file
lc_numeric|en_US.UTF-8|configuration file
lc_time|en_US.UTF-8|configuration file
listen_addresses|*|configuration file
log_autovacuum_min_duration|0|configuration file
log_checkpoints|on|configuration file
log_connections|on|configuration file
log_destination|syslog|configuration file
log_directory|pg_log|configuration file
log_filename|postgresql-%a.log|configuration file
log_line_prefix|user=%u,db=%d,ip=%h |configuration file
log_min_duration_statement|100ms|configuration file
log_min_messages|debug1|configuration file
log_rotation_age|1d|configuration file
log_rotation_size|0|configuration file
log_timezone|US/Pacific|configuration file
log_truncate_on_rotation|on|configuration file
logging_collector|off|configuration file
maintenance_work_mem|1GB|configuration file
max_connections|1500|configuration file
max_locks_per_transaction|1000|configuration file
max_stack_depth|2MB|environment variable
max_wal_senders|5|configuration file
port|5432|command line
random_page_cost|2|configuration file
shared_buffers|8GB|configuration file
synchronous_commit|off|configuration file
syslog_facility|local0|configuration file
syslog_ident|postgres|configuration file
TimeZone|US/Pacific|configuration file
vacuum_freeze_table_age|0|configuration file
wal_buffers|32MB|configuration file
wal_keep_segments|250|configuration file
wal_level|hot_standby|configuration file
wal_sync_method|fsync|configuration file
work_mem|130MB|configuration file

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

Re: [GENERAL] Increase in max_connections

[GENERAL] Increase in max_connections

14 matches

Site Navigation

Mail list logo

Footer information