On Wed, 19 Apr 2023 at 17:26, shaurya jain <12345shau...@gmail.com> wrote:
>
> Hi Team,
>
> Could you please help me with this, It's urgent for the production 
> environment.
>
> On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shau...@gmail.com> wrote:
>>
>> Hi Team,
>>
>> Could you please help, It's urgent for the production env?
>>
>> On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shau...@gmail.com> wrote:
>>>
>>> Hi Team,
>>>
>>> Postgres Version:- 13.8
>>> Issue:- Logical replication failing with SSL SYSCALL error
>>> Priority:-High
>>>
>>> We are migrating our database through logical replications, and all of 
>>> sudden below error pops up in the source and target logs which leads us to 
>>> nowhere.
>>>
>>> Logs from Source:-
>>> LOG:  could not send data to client: Connection reset by peer
>>> STATEMENT:  COPY public.test TO STDOUT
>>> FATAL:  connection to client lost
>>> STATEMENT:  COPY public.test TO STDOUT
>>>
>>> Logs from Target:-
>>> 2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL 
>>> stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932
>>> 2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical 
>>> replication worker" (PID 1250) exited with exit code 1
>>> 2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table 
>>> synchronization worker for subscription " sub_tables_2_180", table "test" 
>>> has started
>>> 2023-04-15 19:12:05 
>>> UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING: there is 
>>> no transaction in progress
>>> 2023-04-15 19:14:08 
>>> UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG: could not 
>>> receive data from client: Connection reset by peer
>>> 2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL 
>>> stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL 
>>> stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL 
>>> stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical 
>>> replication worker" (PID 2556) exited with exit code 1
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical 
>>> replication worker" (PID 2112) exited with exit code 1
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical 
>>> replication worker" (PID 1089) exited with exit code 1
>>> 2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for 
>>> subscription "sub_tables_2_180" has started
>>> 2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for 
>>> subscription "sub_tables_3_192" has started
>>> 2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for 
>>> subscription "sub_tables_1_180" has started
>>>
>>> Just after this error, all other replication slots get disabled for some 
>>> time and come back online along with COPY command with the new PID in 
>>> pg_stat_activity.
>>>
>>> I have a few queries regarding this:-
>>>
>>> The exact reason for disconnection (Few articles claim memory and few 
>>> network)
This might be because of network failure, did you notice any network
instability, could you check the TCP settings.
You could check the following configurations tcp_keepalives_idle,
tcp_keepalives_interval and tcp_keepalives_count.
This means it will connect the server based on tcp_keepalives_idle
seconds specified , if the server does not respond in
tcp_keepalives_interval seconds it'll try again, and will consider the
connection gone after tcp_keepalives_count failures.

>>> Will it lead to data inconsistency?
It will not lead to inconsistency. In case of failure the failed
transaction will be rolled back.

>>> Does this new PID COPY command again migrate the whole data of the test 
>>> table once again?
Yes, it will migrate the whole table data again in case of failures.

Regards,
Vignesh


Reply via email to