On 2017-05-09 10:50, Petr Jelinek wrote:
On 09/05/17 00:03, Erik Rijkers wrote:
On 2017-05-05 02:00, Andres Freund wrote:

Could you have a look?

Running tests with these three patches:

0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
0002-WIP-Possibly-more-robust-snapbuild-approach.patch     +
fix-statistics-reporting-in-logical-replication-work.patch
    (on top of 44c528810)

I test by 15-minute pgbench runs while there is a logical replication
connection. Primary and replica are on the same machine.

I have seen errors on 3 different machines (where error means: at least
1 of the 4 pgbench tables is not md5-equal). It seems better, faster
machines yield less errors.

Normally I see in pg_stat_replication (on master) one process in state
'streaming'.

 pid  |     wal     | replay_loc  |   diff   |   state   |   app   |
sync_state
16495 | 11/EDBC0000 | 11/EA3FEEE8 | 58462488 | streaming | derail2 | async

Often there are another two processes in pg_stat_replication that remain
in state 'startup'.

In the failing sessions the 'streaming'-state process is missing; in
failing sessions there are only the two processes that are and remain in
'startup'.

Hmm, startup is the state where slot creation is happening. I wonder if
it's just taking long time to create snapshot because of the 5th issue
which is not yet fixed (and the original patch will not apply on top of
this change). Alternatively there is a bug in this patch.

Did you see high CPU usage during the test when there were those
"startup" state walsenders?


I haven't noticed but I didn't pay attention to that particularly.

I'll try to get some CPU-info logged...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to