On 10/21/18 2:06 AM, Boris Sagadin wrote:
Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x 1.9TB NVMe 
drive) AWS instance with about 5TB of disk space occupied, ext4, Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing from 
master, which took about 4 hours, then it started applying the WALs. However, 
it seems it can't catch up. Delay is still around 3 hours (measured with now() 
- pg_last_xact_replay_timestamp()), even a day later. It goes a few 100s up and 
down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected clients, 
recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I found 
suspicious is stracing the recovery process constantly produces many errors 
such as:

lseek(428, 0, SEEK_END)                 = 780124160
lseek(30, 0, SEEK_END)                  = 212992
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily 
unavailable)
lseek(680, 0, SEEK_END)                 = 493117440
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily 
unavailable)
lseek(774, 0, SEEK_END)                 = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily 
unavailable)
lseek(774, 0, SEEK_END)                 = 583368704
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily 
unavailable)
lseek(277, 0, SEEK_END)                 = 502882304
lseek(6, 516096, SEEK_SET)              = 516096
read(6, 
"\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
 8192) = 8192
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily 
unavailable)
lseek(735, 0, SEEK_END)                 = 272809984
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily 
unavailable)
lseek(277, 0, SEEK_END)                 = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]


Perf top on recovery produces:

  27.76%  postgres            [.] pglz_decompress
    9.90%  [kernel]            [k] entry_SYSCALL_64_after_swapgs
    7.09%  postgres            [.] hash_search_with_hash_value
    4.26% libpthread-2.23.so <http://libpthread-2.23.so>  [.] llseek
    3.64% libpthread-2.23.so <http://libpthread-2.23.so>  [.] __read_nocancel
    2.80%  [kernel]            [k] __fget_light
    2.67%  postgres            [.] 0x000000000034d3ba
    1.85%  [kernel]            [k] ext4_llseek
    1.84%  postgres            [.] pg_comp_crc32c_sse42
    1.44%  postgres            [.] hash_any
    1.35%  postgres            [.] 0x000000000036afad
    1.29%  postgres            [.] MarkBufferDirty
    1.21%  postgres            [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no change.

I can turn off the WAL compression but I doubt this is the main culprit. Any 
ideas appreciated.

Regards,
Boris


I dont have an answer, but at a guess, decompress is taking the most amount of time, and 
the process spewing "Resource temporarily unavailable" might be just sitting 
around waiting for the decompress to finish.

I'd say yea, definitely try disabling WAL compression.  I'm really curious what 
that would do.

I assume you are using streaming replication?

-Andy

Reply via email to