Hi El mié., 24 de oct. de 2018 a la(s) 00:39, Boris Sagadin ( bo...@infosplet.com) escribió:
> Yes, times are all identical, set to UTC, ntpd is used. > > log_delay > ----------- > 15.788175 > > This is delay at this moment, but we graph replication delay and it's > fluctuating between 0 and 30s. > But the fluctuation is between 0 and 30s!1, are not 4 hours fortunately. Apart from the theme wal compression I think you should check networks > Before I turned off wal compression, lag was much bigger (0 to up to 8 > minutes). We have lots of tables (40k) and many upserts. > > > Boris > > On Tue, Oct 23, 2018 at 8:24 PM, Hellmuth Vargas <hiv...@gmail.com> wrote: > >> Hi >> >> Both servers are configured with the same date, time and time >> configuration? >> >> El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas ( >> hiv...@gmail.com) escribió: >> >>> Hi >>> >>> which result you get from the following query: >>> >>> SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() >>> THEN 0 >>> ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) >>> END AS log_delay; >>> >>> source: >>> >>> https://severalnines.com/blog/postgresql-streaming-replication-deep-dive >>> >>> El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin ( >>> bo...@infosplet.com) escribió: >>> >>>> Nothing special, just: >>>> >>>> standby_mode = 'on' >>>> primary_conninfo = 'host=... user=repmgr application_name=nodex' >>>> recovery_target_timeline = 'latest' >>>> >>>> >>>> Boris >>>> >>>> On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hiv...@gmail.com> >>>> wrote: >>>> >>>>> Hi >>>>> >>>>> can share recovery.conf file settings?? >>>>> >>>>> El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin ( >>>>> bo...@infosplet.com) escribió: >>>>> >>>>>> Yes, turning wal_compression off improves things. Slave that was >>>>>> mentioned unfortunately lagged too much before this setting was applied >>>>>> and >>>>>> was turned off. However the remaining slave lags less now, although still >>>>>> occasionally up to a few minutes. I think single threadedness of recovery >>>>>> is a big slowdown for write heavy databases. Maybe an option to increase >>>>>> wal_size beyond 16MB in v11 will help. >>>>>> >>>>>> In the meantime we'll solve this by splitting the DB to 2 or 3 >>>>>> clusters or maybe trying out some sharding solution like Citus. >>>>>> >>>>>> >>>>>> Boris >>>>>> >>>>>> On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <bo...@infosplet.com> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x >>>>>>> 1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, >>>>>>> ext4, >>>>>>> Ubuntu 16.04. >>>>>>> >>>>>>> Multi-tenant DB with about 40000 tables, insert heavy. >>>>>>> >>>>>>> I started a new slave with identical HW specs, SR. DB started >>>>>>> syncing from master, which took about 4 hours, then it started applying >>>>>>> the >>>>>>> WALs. However, it seems it can't catch up. Delay is still around 3 hours >>>>>>> (measured with now() - pg_last_xact_replay_timestamp()), even a day >>>>>>> later. >>>>>>> It goes a few 100s up and down, but it seems to float around 3h mark. >>>>>>> >>>>>>> Disk IO is low at about 10%, measured with iostat, no connected >>>>>>> clients, recovery process is at around 90% CPU single core usage. >>>>>>> >>>>>>> Tried tuning the various parameters, but with no avail. Only thing I >>>>>>> found suspicious is stracing the recovery process constantly produces >>>>>>> many >>>>>>> errors such as: >>>>>>> >>>>>>> lseek(428, 0, SEEK_END) = 780124160 >>>>>>> lseek(30, 0, SEEK_END) = 212992 >>>>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>>>> temporarily unavailable) >>>>>>> lseek(680, 0, SEEK_END) = 493117440 >>>>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>>>> temporarily unavailable) >>>>>>> lseek(774, 0, SEEK_END) = 583368704 >>>>>>> >>>>>>> ...[snip]... >>>>>>> >>>>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>>>> temporarily unavailable) >>>>>>> lseek(774, 0, SEEK_END) = 583368704 >>>>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>>>> temporarily unavailable) >>>>>>> lseek(277, 0, SEEK_END) = 502882304 >>>>>>> lseek(6, 516096, SEEK_SET) = 516096 >>>>>>> read(6, >>>>>>> "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"..., >>>>>>> 8192) = 8192 >>>>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>>>> temporarily unavailable) >>>>>>> lseek(735, 0, SEEK_END) = 272809984 >>>>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>>>> temporarily unavailable) >>>>>>> lseek(277, 0, SEEK_END) = 502882304 >>>>>>> >>>>>>> ls -l fd/9 >>>>>>> lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358] >>>>>>> >>>>>>> >>>>>>> Perf top on recovery produces: >>>>>>> >>>>>>> 27.76% postgres [.] pglz_decompress >>>>>>> 9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs >>>>>>> 7.09% postgres [.] hash_search_with_hash_value >>>>>>> 4.26% libpthread-2.23.so [.] llseek >>>>>>> 3.64% libpthread-2.23.so [.] __read_nocancel >>>>>>> 2.80% [kernel] [k] __fget_light >>>>>>> 2.67% postgres [.] 0x000000000034d3ba >>>>>>> 1.85% [kernel] [k] ext4_llseek >>>>>>> 1.84% postgres [.] pg_comp_crc32c_sse42 >>>>>>> 1.44% postgres [.] hash_any >>>>>>> 1.35% postgres [.] 0x000000000036afad >>>>>>> 1.29% postgres [.] MarkBufferDirty >>>>>>> 1.21% postgres [.] XLogReadRecord >>>>>>> [...] >>>>>>> >>>>>>> Tried changing the process limits with prlimit to unlimited, but no >>>>>>> change. >>>>>>> >>>>>>> I can turn off the WAL compression but I doubt this is the main >>>>>>> culprit. Any ideas appreciated. >>>>>>> >>>>>>> Regards, >>>>>>> Boris >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Cordialmente, >>>>> >>>>> Ing. Hellmuth I. Vargas S. >>>>> Esp. Telemática y Negocios por Internet >>>>> Oracle Database 10g Administrator Certified Associate >>>>> EnterpriseDB Certified PostgreSQL 9.3 Associate >>>>> >>>>> >>>> >>> >>> -- >>> Cordialmente, >>> >>> Ing. Hellmuth I. Vargas S. >>> >>> >>> >> >> -- >> Cordialmente, >> >> Ing. Hellmuth I. Vargas S. >> >> >> > -- Cordialmente, Ing. Hellmuth I. Vargas S. Esp. Telemática y Negocios por Internet Oracle Database 10g Administrator Certified Associate EnterpriseDB Certified PostgreSQL 9.3 Associate