On Wed, 19 Apr 2023 at 17:26, shaurya jain <12345shau...@gmail.com> wrote: > > Hi Team, > > Could you please help me with this, It's urgent for the production > environment. > > On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shau...@gmail.com> wrote: >> >> Hi Team, >> >> Could you please help, It's urgent for the production env? >> >> On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shau...@gmail.com> wrote: >>> >>> Hi Team, >>> >>> Postgres Version:- 13.8 >>> Issue:- Logical replication failing with SSL SYSCALL error >>> Priority:-High >>> >>> We are migrating our database through logical replications, and all of >>> sudden below error pops up in the source and target logs which leads us to >>> nowhere. >>> >>> Logs from Source:- >>> LOG: could not send data to client: Connection reset by peer >>> STATEMENT: COPY public.test TO STDOUT >>> FATAL: connection to client lost >>> STATEMENT: COPY public.test TO STDOUT >>> >>> Logs from Target:- >>> 2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932 >>> 2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 1250) exited with exit code 1 >>> 2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table >>> synchronization worker for subscription " sub_tables_2_180", table "test" >>> has started >>> 2023-04-15 19:12:05 >>> UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING: there is >>> no transaction in progress >>> 2023-04-15 19:14:08 >>> UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG: could not >>> receive data from client: Connection reset by peer >>> 2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 2556) exited with exit code 1 >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 2112) exited with exit code 1 >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 1089) exited with exit code 1 >>> 2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for >>> subscription "sub_tables_2_180" has started >>> 2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for >>> subscription "sub_tables_3_192" has started >>> 2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for >>> subscription "sub_tables_1_180" has started >>> >>> Just after this error, all other replication slots get disabled for some >>> time and come back online along with COPY command with the new PID in >>> pg_stat_activity. >>> >>> I have a few queries regarding this:- >>> >>> The exact reason for disconnection (Few articles claim memory and few >>> network) This might be because of network failure, did you notice any network instability, could you check the TCP settings. You could check the following configurations tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count. This means it will connect the server based on tcp_keepalives_idle seconds specified , if the server does not respond in tcp_keepalives_interval seconds it'll try again, and will consider the connection gone after tcp_keepalives_count failures.
>>> Will it lead to data inconsistency? It will not lead to inconsistency. In case of failure the failed transaction will be rolled back. >>> Does this new PID COPY command again migrate the whole data of the test >>> table once again? Yes, it will migrate the whole table data again in case of failures. Regards, Vignesh