Re: [PERFORM] Postgres slave not catching up (on 9.2)
The slave (I don't have control on the master) is using 2 NFS file systems, one for WALs and another one for the data, on Netapp controllers: dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=60 0) dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=60 0) You should use noatime to avoid unnecessary IO. Just to mention that changing the mount points from: dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw, actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw, actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) to dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) it did have a big impact. The profile of the recovery process on terms of calls changed quite a lot: From: [postgres@itrac1202 tmp]$ strace -p 9596 -c Process 9596 attached - interrupt to quit Process 9596 detached % time seconds usecs/call callserrors syscall -- --- --- - - 78.730.217824 0456855381376 read 17.870.049453 0515320 lseek 2.890.007989 12 669 669 poll 0.330.000912 2536 open 0.070.000206 0 994 994 stat 0.050.000151 0 995 787 rt_sigreturn 0.050.000133 0 673 write 0.000.00 036 close 0.000.00 052 kill -- --- --- - - 100.000.276668975630383826 total To: [postgres@itrac1202 tmp]$ strace -p 9596 -c Process 9596 attached - interrupt to quit Process 9596 detached % time seconds usecs/call callserrors syscall -- --- --- - - 78.730.217824 0456855381376 read 17.870.049453 0515320 lseek 2.890.007989 12 669 669 poll 0.330.000912 2536 open 0.070.000206 0 994 994 stat 0.050.000151 0 995 787 rt_sigreturn 0.050.000133 0 673 write 0.000.00 036 close 0.000.00 052 kill -- --- --- - - 100.000.276668975630383826 total We did also increased the shared_buffers from 12 to 24GB. The lag has decreased most of the time: */10 * * * * /usr/local/pgsql/postgresql-9.2.9/bin/psql -U postgres -h /var/lib/pgsql/ -p 6600 -d puppetdb -c SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag | perl -ne 'if (/\|\s+(\d{2}):(\d{2}):(\d{2})\.\d+/) {$hour=$1;$min=$2;$sec=$3; print $_;}' /tmp/lag929morememmount.log ... 2014-12-14 14:10:01.688947+01 | 00:00:00.096524 2014-12-14 14:20:01.798223+01 | 00:00:00.024083 2014-12-14 14:30:01.884448+01 | 00:00:00.420791 2014-12-14 14:40:01.960623+01 | 00:00:00.168318 2014-12-14 14:50:01.191487+01 | 00:00:00.163832 2014-12-14 15:00:02.146436+01 | 00:00:00.026934 2014-12-14 15:10:01.277963+01 | 00:00:00.332185 2014-12-14 15:20:01.353979+01 | 00:00:00.020616 2014-12-14 15:30:01.417092+01 | 00:00:00.584768 2014-12-14 15:40:01.575347+01 | 00:00:00.151685 2014-12-14 15:50:01.205507+01 | 00:00:00.102073 2014-12-14 16:00:01.321511+01 | 00:00:00.590677 2014-12-14 16:10:01.570474+01 | 00:00:00.182683 2014-12-14 16:20:01.640095+01 | 00:00:00.420185 2014-12-14 16:30:01.767033+01 | 00:00:00.015989 2014-12-14 16:40:01.849532+01 | 00:00:00.106296 2014-12-14 16:50:01.920876+01 | 00:00:00.258851 2014-12-14 17:00:02.000278+01 | 00:00:00.119841 2014-12-14 17:10:01.894227+01 | 00:00:00.091599 2014-12-14 17:20:01.61729+01 | 00:00:00.367367 2014-12-14 17:30:01.683326+01 | 00:00:00.103884 2014-12-14 17:40:01.755904+01 | 00:00:00.051262 2014-12-14 17:50:01.833825+01 | 00:00:00.06901 2014-12-14 18:00:01.901236+01 | 00:00:00.17467 2014-12-14 18:10:01.186283+01 | 00:00:00.214941 2014-12-14 18:20:01.145413+01 | 00:00:00.03517 2014-12-14 18:30:01.241746+01 | 00:00:00.207842 2014-12-14 18:40:01.299413+01 | 00:00:00.147878 2014-12-14 18:50:01.368541+01 | 00:00:00.393893 2014-12-14 19:00:01.430736+01 | 00:00:00.031226 2014-12-14 19:10:01.672117+01 | 00:05:03.512832 2014-12-14 19:20:01.9195+01 | 00:06:39.08761 2014-12-14 19:30:02.184486+01 | 00:00:00.307668 2014-12-14 19:40:01.227278+01
Re: [PERFORM] Postgres slave not catching up (on 9.2)
dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw, actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw, actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) to dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) it did have a big impact. The profile of the recovery process on terms of calls changed quite a lot: From: [postgres@itrac1202 tmp]$ strace -p 9596 -c Process 9596 attached - interrupt to quit Process 9596 detached % time seconds usecs/call callserrors syscall -- --- --- - - 78.730.217824 0456855381376 read 17.870.049453 0515320 lseek 2.890.007989 12 669 669 poll 0.330.000912 2536 open 0.070.000206 0 994 994 stat 0.050.000151 0 995 787 rt_sigreturn 0.050.000133 0 673 write 0.000.00 036 close 0.000.00 052 kill -- --- --- - - 100.000.276668975630383826 total This one should read: [root@itrac1202 ~]# strace -c -p 28073 Process 28073 attached - interrupt to quit Process 28073 detached % time seconds usecs/call callserrors syscall -- --- --- - - 59.16 10.756007 5 2201974 1202832 read 40.697.398247 3 2367885 lseek 0.140.025970 154 169 open 0.000.57 0 169 close 0.000.38 0 169 kill 0.000.33 129 write 0.000.00 0 1 semop -- --- --- - - 100.00 18.180352 4570396 1202832 total Apologies for the confusion. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres slave not catching up (on 9.2)
Hi Andres, Sorry for my delay to reply. Here below my replies: I have a couple of questions: 1) Is the standby actually used for querying? Is it possible that replay frequently conflicts with active queries? As you don't have hot_standby_feedback enabled that seems quite possible. Nowadays we don't manage to have a decent lag so the standby is not use at all. No clients connect to it. 2) Is the startup process on the standby CPU or IO bound? The servers is almost idle. I don't see any bottle neck either on CPU or IO. 3) Does the workload involve loads of temporary tables or generally transactions locking lots of tables exclusively in one transaction? We have monitored the master for a couple of days we haven't detected any create temp table statement. For the locks I see it's also not the case in my opinion, at a given point in time I don't see many tables lock in exclusive mode: puppetdb=# SELECT locktype, relation::regclass, mode, transactionid AS tid, datname, virtualtransaction AS vtid, pid, granted FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid = l.database WHERE NOT pid = pg_backend_pid(); locktype| relation | mode |tid| datname |vtid| pid | granted ---+--+--+---+--+++- relation | resource_params_cache_pkey | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | resource_params_cache| AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | catalog_resources_pkey | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | idx_catalog_resources_exported_true | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | idx_catalog_resources_resource | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | idx_catalog_resources_type | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | idx_catalog_resources_type_title | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | catalog_resources| AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | idx_catalogs_transaction_uuid| AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | idx_catalogs_transaction_uuid| RowExclusiveLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs_certname_key| AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs_certname_key| RowExclusiveLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs_hash_key| AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs_hash_key| RowExclusiveLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs_pkey| AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs_pkey| RowExclusiveLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | catalogs | RowExclusiveLock | | puppetdb | 5/3099716 | 54422 | t relation | certnames_pkey | AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | certnames_pkey | RowExclusiveLock | | puppetdb | 5/3099716 | 54422 | t relation | certnames| AccessShareLock | | puppetdb | 5/3099716 | 54422 | t relation | certnames| RowExclusiveLock | | puppetdb | 5/3099716 | 54422 | t virtualxid| | ExclusiveLock | | | 5/3099716 | 54422 | t relation | resource_params_cache_pkey | AccessShareLock | | puppetdb | 20/2901642 | 100098 | t relation | resource_params_cache| AccessShareLock | | puppetdb | 20/2901642 | 100098 | t relation | catalog_resources_pkey | AccessShareLock | | puppetdb | 20/2901642 | 100098 | t relation | idx_catalog_resources_exported_true | AccessShareLock | | puppetdb | 20/2901642 | 100098 | t relation
Re: [PERFORM] Postgres slave not catching up (on 9.2)
Indeed I could save some IO with noatime. I must say I haven’t found any recommendation about mount options for postgresql, likely because this is not encourage. The ones you see are taking from a Oracle cluster configuration where several nodes see the same files. It's not the case on this setup. The IO is not an issue. The storage is not at all saturated. Slave gets streaming perfectly but the apply is quite slow, looks like always working with pages of 8k at a time: --datafiles [root@ ~]# /ORA/dbs01/syscontrol/projects/dfm/bin/smetrics -i 5 -n 100 -o vol dbnasg403-12a:/vol/dodpupdbtst03 Instance total_ops read_ops write_ops read_data write_data avg_latency read_latency write_latenc /s /s/s b/sb/s us us us dodpupdbtst03 64660 162 02350619 31.53 0 764.70 dodpupdbtst03 67620 843 08751169 48.32 0 263.10 dodpupdbtst03 70230 1547 0 14914498 112.88 0 303.16 dodpupdbtst03 53730 321 65013809930 58.44 11287.75 467.21 dodpupdbtst03 56180 183 01661200 20.91 0 265.61 dodpupdbtst03 55380 214 03471380 29.24 0 374.27 dodpupdbtst03 57530 425 04973131 45.36 0 351.08 dodpupdbtst03 61100 142 02331695 20.96 0 378.95 --WALs Bye Bye[root@ ~]# /ORA/dbs01/syscontrol/projects/dfm/bin/smetrics -i 5 -n 100 -o vol dbnasg401-12a:/vol/dodpupdbtst02 Instance total_ops read_ops write_ops read_data write_data avg_latency read_latency write_latenc /s /s/s b/sb/s us us us dodpupdbtst02 1017 20293 59158992637111 2033.22 10116.09 172.61 dodpupdbtst02 1284 242 141 73687124309409 1235.11 6306.37 172.89 dodpupdbtst02 1357 231 268 68698168489466 957.55 5104.09 192.26 dodpupdbtst02 1566 264 288 81429659008529 747.96 4069.78 180.00 dodpupdbtst02 1333 235 153 76010514755791 993.81 5394.99 176.99 dodpupdbtst02 1261 199 287 61248219075170 896.32 5150.28 203.81 dodpupdbtst02 963 161 192 49559966066333 1757.66 10035.06 213.12 dodpupdbtst02 924 159 157 47826174807262 1092.61 5804.85 236.91 dodpupdbtst02 591 97 137 28990854275046 1218.24 6980.66 190.20 Writes are usually fast (us as they use the NVRAM )and reads are about 5 ms which is quite ok considering SATA disks (they have a flash cache of 512GB, this is why we get this average). Thank you, Ruben -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres slave not catching up (on 9.2)
Hi, On 2014-11-08 13:11:25 +, Ruben Domingo Gaspar Aparicio wrote: Hello, I have built up a hot-standby between a master running 9.2.4 and a slave running 9.2.9. I did the initial copy using pg_basebackup . My recovery.conf looks like: standby_mode = 'on' primary_conninfo = 'host=p port= user=replicator' trigger_file = 'failover.now' restore_command = 'test -f /ORA/dbs02/PUPDBTST/archive/%f ln -fns /ORA/dbs02/PUPDBTST/archive/%f %p' archive_cleanup_command = '/usr/local/pgsql/postgresql-9.2.9/bin/pg_archivecleanup /ORA/dbs03/PUPDBTST/data %r' The slave (I don't have control on the master) is using 2 NFS file systems, one for WALs and another one for the data, on Netapp controllers: dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) The streaming is working perfectly: /usr/local/pgsql/postgresql-9.2.9/bin/psql -U admin -h X -p X -d puppetdb -c select pid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,replay_location,sync_state from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | state | sent_location | write_location | replay_location | sync_state +--++--+-+---+---++-+ 117659 |16384 | replicator | walreceiver | 10.16.7.137 | streaming | AA74/DD630978 | AA74/DD630978 | A977/F84F0BE0 | async (1 row) But the lag is increasing constantly, it looks the replay can not cope with: I have a couple of questions: 1) Is the standby actually used for querying? Is it possible that replay frequently conflicts with active queries? As you don't have hot_standby_feedback enabled that seems quite possible. 2) Is the startup process on the standby CPU or IO bound? 3) Does the workload involve loads of temporary tables or generally transactions locking lots of tables exclusively in one transaction? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres slave not catching up (on 9.2)
On Sat, Nov 8, 2014 at 2:11 PM, Ruben Domingo Gaspar Aparicio wrote: The slave (I don't have control on the master) is using 2 NFS file systems, one for WALs and another one for the data, on Netapp controllers: dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) You should use noatime to avoid unnecessary IO. The master produces quite a lot of WALs. This is what I get on the slave (number of WAL files, date-hour, Total size in MB), so per day is more than 400GB: I tried to play with how the IO is handled, making it less strict setting synchronous_commit and fsync to off with not much success. I have also done a second test increasing shared_buffers from 12GB to 24GB (we are running on a 48GB, 8 cores server). Please let me know if you can see something obvious I am missing. Your IO system needs to be able to deliver sustained IO bandwith at least as large as you need to read and write all the changes. What raw IO bandwidth do those NFS file systems deliver _long term_? I am not talking about spikes because there are buffers. I am talking about the minimum of network throughput on one hand and raw disk IO those boxes can do on the other hand. Then, how much of it is available to your slave? Did you do the math to ensure that the IO bandwidth you have available on the slave is at least as high as what is needed? Note that it's not simply the WAL size that needs to be written and read but also data pages. Kind regards robert -- [guy, jim].each {|him| remember.him do |as, often| as.you_can - without end} http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance