Re: [GENERAL] connections not getting closed on a replica

2015-12-11 Thread Carlo Cabanilla
Thanks for the reply Kevin.


> > I'm trying to figure out why we had a build up of connections on
> > our streaming replica.
>
> Seriously, from the data provided, about all I can say is "because
> you were opening them faster than you were closing them".  You
> don't say how many cores or how much RAM you had, but allowing 2000
> connections is rarely a good idea.
>
>
> http://tbeitr.blogspot.com/2015/11/for-better-service-please-take-number.html
>
>
The replica is an i2.4xl ec2 instance, so 122gb ram, 16 cores, 53 "compute
units", 4 local storage ssds in raid 10. effective_cache_size is 89gb,
work_mem is 128mb. Applications are connecting via pgbouncer with a default
pool size of 650, steady state of 500-600 server connections for 1000-1500
clients. Normally pgbouncer limits the connections pretty well. I was most
surprised when postgres wasn't timing connections out to keep them low.

Also, pgbouncer wasn't reporting an increase in the number of server
connections even though they were showing up in postgres directly, so I was
thinking pgbouncer thought these connections were already closed even
though postgres couldn't close them.



> > We're running postgres 9.3.5 on the master and 9.3.10 on
> > the replica,
>
> Because of data loss bugs in 9.3.5 which are fixed in 9.3.10, I
> recommend you update the master ASAP.
>
>
Yep, we're working on it.


> > linux 3.2.0 on both,
>
> There are scheduler problems in kernels at that level; I recommend
> upgrading linux to a 3.8 or later kernel; 3.13 or later is best, if
> possible.
>
>
Ditto.


> I don't suppose you have vmstat 1 output from the incident?  If it
> happens again, try to capture that.
>
>
Are you looking for a stat in particular?

This is the idle of each core during the first incident:

 http://dd-pastebin.s3.amazonaws.com/carlo/pg-connections-issue2.png

The dark blue at the bottom is core0, which I believe is the one pgbouncer
is using. There's a drop when the connections start running away, but
there's still some headroom there, and the other cores are pretty much
unaffected until the restart.



> Have you tuned the OS vm.dirty_* settings?  How about the
> transparent_hugepage settings?
>
>
I think the vm dirty settings are default:

vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000


If this is a NUMA environment, what does this show?:
>
> numactl --hardware
>
> available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 124999 MB
node 0 free: 740 MB
node distances:
node   0
  0:  10



Carlo Cabanilla
Datadog


[GENERAL] connections not getting closed on a replica

2015-12-10 Thread Carlo Cabanilla
Hello,

I'm trying to figure out why we had a build up of connections on our
streaming replica. We're running postgres 9.3.5 on the master and 9.3.10 on
the replica, linux 3.2.0 on both, disks in raid10.  Here are some graphs to
illustrate what happened:

http://dd-pastebin.s3.amazonaws.com/carlo/pg-connections-issue.png

There were 3 different instances, as shown by the spikes in server
connections graph. Each incident, the replica was still serving some
requests but much slower, and connections kept building up.
Pg_cancel_backend(),  pg_terminate_backend() and the server timeout of 5s
weren't closing them. Eventually we had to restart the database to get back
to normal which isn't ideal.

The second graph shows the rows updated and hot updated on the master
(n_tup_upd and n_tup_hot_upd from pg_stat_user_tables) on a large,
frequently updated table (27M rows, 20GB). It looks like hot updates start
going down right before connections run away. Going through the code, I
found this comment:
https://github.com/postgres/postgres/blob/REL9_3_5/src/backend/access/heap/heapam.c#L3031-L3041

Which from my understanding means that a hot update is a weaker lock, but
if column is being "detoasted" an update will use the stronger lock
instead. I'm not sure what detoasted means, but I do know that table has a
toast table as it's using an hstore column. We can store pretty large
values in the hstore column, so I wouldn't be surprised if that was causing
the issue.

The third graph shows toast block cache hits o (toast_blks_hit from
pg_statio_user_tables) and shows that the hits on the replica (purple) took
a nose dive after the first incident and pretty much never recovered since
then. Does that mean the toast table is too large to fit in cache?

The fourth graph shows the rate of dead row creation on the master, with
the large blue area being the table in question. I'm assuming the dead rows
are from updates on the hstore, but not certain.

The fifth graph shows the disk write throughput on the replica, which I'm
assuming is from applying the WAL updates on that large table? Write
throughput was higher than normal, but not maxing out the capacity of the
disk. Write latency was minimal.

The 6th graph is the replica lag, which only fell behind during the first
incident and not the others.

My question is why was the postgres replica unable to close connections
during these incidents? And why were reads on the replica so slow at the
time? Are there locks that are locking out the readers when there's lots of
update churn?

Carlo Cabanilla
Datadog