subject:"Re\: Logical replication timeout problem"

Re: Logical replication timeout problem

2021-11-11 Thread Fabrice Chapuis

Hello,
Our lab is ready now. Amit,  I compile Postgres 10.18 with your patch.Tang,
I used your script to configure logical replication between 2 databases and
to generate 10 million entries in an unreplicated foo table. On a
standalone instance no error message appears in log.
I activate the physical replication between 2 nodes, and I got following
error:

2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive message
2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION 0/300
TIMELINE 1
2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical replication
worker due to administrator command
2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical
replication worker for subscription 16413 (PID 12064) exited with exit code
1
2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive message

This message look like strange because no admin command have been executed
during data load.
I did not find any error related to the timeout.
The message coming from the modification made with the patch comes back all
the time: attempt to send keep alive message. But there is no "sent keep
alive message".

Why logical replication worker exit when physical replication is configured?

Thanks for your help

Fabrice

On Fri, Oct 8, 2021 at 9:33 AM Fabrice Chapuis 
wrote:

> Thanks Tang for your script.
> Our debugging environment will be ready soon. I will test your script and
> we will try to reproduce the problem by integrating the patch provided by
> Amit. As soon as I have results I will let you know.
>
> Regards
>
> Fabrice
>
> On Thu, Sep 30, 2021 at 3:15 AM Tang, Haiying/唐 海英 <
> tanghy.f...@fujitsu.com> wrote:
>
>> On Friday, September 24, 2021 12:04 AM, Fabrice Chapuis <
>> fabrice636...@gmail.com> wrote:
>>
>> >
>>
>> > Thanks for your patch, we are going to set up a lab in order to debug
>> the function.
>>
>>
>>
>> Hi
>>
>>
>>
>> I tried to reproduce this timeout problem on version10.18 but failed.
>>
>> In my trial, I inserted large amounts of data at publisher, which took
>> more than 1 minute to replicate.
>>
>> And with the patch provided by Amit, I saw that the frequency of invoking
>>
>> WalSndKeepaliveIfNecessary function is raised after I inserted data.
>>
>>
>>
>> The test script is attached. Maybe you can try it on your machine and
>> check if this problem could happen.
>>
>> If I miss something in the script, please let me know.
>>
>> Of course, it will be better if you can provide your script to reproduce
>> the problem.
>>
>>
>>
>> Regards
>>
>> Tang
>>
>>

Re: Logical replication timeout problem

2021-11-11 Thread Amit Kapila

On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
 wrote:
>
> Hello,
> Our lab is ready now. Amit,  I compile Postgres 10.18 with your patch.Tang, I 
> used your script to configure logical replication between 2 databases and to 
> generate 10 million entries in an unreplicated foo table. On a standalone 
> instance no error message appears in log.
> I activate the physical replication between 2 nodes, and I got following 
> error:
>
> 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive message
> 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION 0/300 
> TIMELINE 1
> 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical replication 
> worker due to administrator command
> 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical replication 
> worker for subscription 16413 (PID 12064) exited with exit code 1
> 2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive message
>
> This message look like strange because no admin command have been executed 
> during data load.
> I did not find any error related to the timeout.
> The message coming from the modification made with the patch comes back all 
> the time: attempt to send keep alive message. But there is no "sent keep 
> alive message".
>
> Why logical replication worker exit when physical replication is configured?
>

I am also not sure why that happened may be due to
max_worker_processes reaching its limit. This can happen because it
seems you configured both publisher and subscriber in the same
cluster. Tang, did you also see the same problem?

BTW, why are you bringing physical standby configuration into the
test? Does in your original setup where you observe the problem the
physical standbys were there?

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2021-11-12 Thread tanghy.f...@fujitsu.com

On Friday, November 12, 2021 2:24 PM Amit Kapila  
wrote:
> 
> On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
>  wrote:
> >
> > Hello,
> > Our lab is ready now. Amit,  I compile Postgres 10.18 with your patch.Tang, 
> > I
> used your script to configure logical replication between 2 databases and to
> generate 10 million entries in an unreplicated foo table. On a standalone 
> instance
> no error message appears in log.
> > I activate the physical replication between 2 nodes, and I got following 
> > error:
> >
> > 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive
> message
> > 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION
> 0/300 TIMELINE 1
> > 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical replication
> worker due to administrator command
> > 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical 
> > replication
> worker for subscription 16413 (PID 12064) exited with exit code 1
> > 2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive
> message
> >
> > This message look like strange because no admin command have been executed
> during data load.
> > I did not find any error related to the timeout.
> > The message coming from the modification made with the patch comes back all
> the time: attempt to send keep alive message. But there is no "sent keep alive
> message".
> >
> > Why logical replication worker exit when physical replication is configured?
> >
> 
> I am also not sure why that happened may be due to
> max_worker_processes reaching its limit. This can happen because it
> seems you configured both publisher and subscriber in the same
> cluster. Tang, did you also see the same problem?
> 

No.
I used the default max_worker_processes value, ran logical replication and
physical replication at the same time. I also changed the data in table on
publisher. But didn't see the same problem.

Regards
Tang

Re: Logical replication timeout problem

2021-11-12 Thread Fabrice Chapuis

I made a mistake in the configuration of my test script, in fact I cannot
reproduce the problem at the moment.
Yes, on the original environment there is physical replication, that's why
for the lab I configured 2 nodes with physical replication.
I'll try new tests next week
Regards

On Fri, Nov 12, 2021 at 7:23 AM Amit Kapila  wrote:

> On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
>  wrote:
> >
> > Hello,
> > Our lab is ready now. Amit,  I compile Postgres 10.18 with your
> patch.Tang, I used your script to configure logical replication between 2
> databases and to generate 10 million entries in an unreplicated foo table.
> On a standalone instance no error message appears in log.
> > I activate the physical replication between 2 nodes, and I got following
> error:
> >
> > 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive
> message
> > 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION
> 0/300 TIMELINE 1
> > 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical
> replication worker due to administrator command
> > 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical
> replication worker for subscription 16413 (PID 12064) exited with exit code
> 1
> > 2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive
> message
> >
> > This message look like strange because no admin command have been
> executed during data load.
> > I did not find any error related to the timeout.
> > The message coming from the modification made with the patch comes back
> all the time: attempt to send keep alive message. But there is no "sent
> keep alive message".
> >
> > Why logical replication worker exit when physical replication is
> configured?
> >
>
> I am also not sure why that happened may be due to
> max_worker_processes reaching its limit. This can happen because it
> seems you configured both publisher and subscriber in the same
> cluster. Tang, did you also see the same problem?
>
> BTW, why are you bringing physical standby configuration into the
> test? Does in your original setup where you observe the problem the
> physical standbys were there?
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-12-22 Thread Fabrice Chapuis

Hello Amit,

I was able to reproduce the timeout problem in the lab.
After loading more than 20 millions of rows in a table which is not
replicated (insert command ends without error), errors related to logical
replication processes appear in the postgres log.
Approximately every 5 minutes worker process is restarted. The snap files
in the slot directory are still present. The replication system seems to be
blocked. Why these snap files are not removed. What do they contain?
I will recompile postgres with your patch to debug.

2021-12-22 14:54:21.506 CET [64939] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:20.908 CET [64938] ERROR:  terminating logical replication
worker due to timeout
2021-12-22 15:01:20.911 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 64938) exited with exit code
1
2021-12-22 15:01:20.923 CET [65037] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:20.932 CET [65038] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:20.932 CET [65038] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:20.932 CET [65037] ERROR:  could not start WAL streaming:
ERROR:  replication slot "sub008_s00" is active for PID 64939
2021-12-22 15:01:20.933 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 65037) exited with exit code
1
2021-12-22 15:01:25.944 CET [65039] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:25.951 CET [65040] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:25.951 CET [65040] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:25.951 CET [65039] ERROR:  could not start WAL streaming:
ERROR:  replication slot "sub008_s00" is active for PID 64939
2021-12-22 15:01:25.952 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 65039) exited with exit code
1
2021-12-22 15:01:30.962 CET [65041] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:30.970 CET [65042] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:30.970 CET [65042] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:30.970 CET [65041] ERROR:  could not start WAL streaming:
ERROR:  replication slot "sub008_s00" is active for PID 64939
2021-12-22 15:01:30.971 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 65041) exited with exit code
1
2021-12-22 15:01:35.982 CET [65043] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:35.990 CET [65044] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:35.990 CET [65044] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')

-rw---. 1 postgres postgres 16270723 Dec 22 16:02
xid-14312-lsn-23-9900.snap
-rw---. 1 postgres postgres 16145717 Dec 22 16:02
xid-14312-lsn-23-9A00.snap
-rw---. 1 postgres postgres 10889437 Dec 22 16:02
xid-14312-lsn-23-9B00.snap
[postgres@s729058a debug]$ ls -ltr pg_replslot/sub008_s012a00/ | wc -l
1420

On Fri, Nov 12, 2021 at 5:57 PM Fabrice Chapuis 
wrote:

> I made a mistake in the configuration of my test script, in fact I cannot
> reproduce the problem at the moment.
> Yes, on the original environment there is physical replication, that's why
> for the lab I configured 2 nodes with physical replication.
> I'll try new tests next week
> Regards
>
> On Fri, Nov 12, 2021 at 7:23 AM Amit Kapila 
> wrote:
>
>> On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
>>  wrote:
>> >
>> > Hello,
>> > Our lab is ready now. Amit,  I compile Postgres 10.18 with your
>> patch.Tang, I used your script to configure logical replication between 2
>> databases and to generate 10 million entries in an unreplicated foo table.
>> On a standalone instance no error message appears in log.
>> > I activate the physical replication between 2 nodes, and I got
>> following error:
>> >
>> > 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive
>> message
>> > 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION
>> 0/300 TIMELINE 1
>> > 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical
>> replication worker due to administrator command
>> > 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical
>> replication worker for subscription 16413 (PID 12064) exited with exit

Re: Logical replication timeout problem

2021-12-23 Thread Amit Kapila

On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis  wrote:
>
> Hello Amit,
>
> I was able to reproduce the timeout problem in the lab.
> After loading more than 20 millions of rows in a table which is not 
> replicated (insert command ends without error), errors related to logical 
> replication processes appear in the postgres log.
> Approximately every 5 minutes worker process is restarted. The snap files in 
> the slot directory are still present. The replication system seems to be 
> blocked. Why these snap files are not removed. What do they contain?
>

These contain changes of insert. I think these are not removed for
your case as your long transaction is never finished. As mentioned
earlier, for such cases, it is better to use 'streaming' feature
released as part of PG-14 but anyway here we are trying to debug
timeout problem.

> I will recompile postgres with your patch to debug.
>

Okay, that might help.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-12-29 Thread Fabrice Chapuis

I put the instance with high level debug mode.
I try to do some log interpretation: After having finished writing the
modifications generated by the insert in the snap files,
then these files are read (restored). One minute after this work starts,
the worker process exit with an error code = 1.
I see that keepalive messages were sent before the work process work leave.

2021-12-28 10:50:01.894 CET [55792] LOCATION:  WalSndKeepalive,
walsender.c:3365
...
2021-12-28 10:50:31.854 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:31.907 CET [55792] DEBUG:  0: StartTransaction(1)
name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
2021-12-28 10:50:31.907 CET [55792] LOCATION:  ShowTransactionStateRec,
xact.c:5075
2021-12-28 10:50:31.907 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:31.907 CET [55792] DEBUG:  0: spill 2271 changes in
XID 14312 to disk
2021-12-28 10:50:31.907 CET [55792] LOCATION:  ReorderBufferSerializeTXN,
reorderbuffer.c:2245
2021-12-28 10:50:31.907 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
*2021-12-28 10:50:32.110 CET [55792] DEBUG:  0: restored 4096/22603999
changes from disk*
2021-12-28 10:50:32.110 CET [55792] LOCATION:  ReorderBufferIterTXNNext,
reorderbuffer.c:1156
2021-12-28 10:50:32.110 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:32.138 CET [55792] DEBUG:  0: restored 4096/22603999
changes from disk
...

*2021-12-28 10:50:35.341 CET [55794] DEBUG:  0: sending replication
keepalive2021-12-28 10:50:35.341 CET [55794] LOCATION:  WalSndKeepalive,
walsender.c:3365*
...
*2021-12-28 10:51:31.995 CET [55791] ERROR:  XX000: terminating logical
replication worker due to timeout*

*2021-12-28 10:51:31.995 CET [55791] LOCATION:  LogicalRepApplyLoop,
worker.c:1267*

Could this function in* Apply main loop* in worker.c help to find a
solution?

rc = WaitLatchOrSocket(MyLatch,
WL_SOCKET_READABLE | WL_LATCH_SET |
WL_TIMEOUT | WL_POSTMASTER_DEATH,
fd, wait_time,
WAIT_EVENT_LOGICAL_APPLY_MAIN);
Thanks for your help

Fabrice

On Thu, Dec 23, 2021 at 11:52 AM Amit Kapila 
wrote:

> On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis 
> wrote:
> >
> > Hello Amit,
> >
> > I was able to reproduce the timeout problem in the lab.
> > After loading more than 20 millions of rows in a table which is not
> replicated (insert command ends without error), errors related to logical
> replication processes appear in the postgres log.
> > Approximately every 5 minutes worker process is restarted. The snap
> files in the slot directory are still present. The replication system seems
> to be blocked. Why these snap files are not removed. What do they contain?
> >
>
> These contain changes of insert. I think these are not removed for
> your case as your long transaction is never finished. As mentioned
> earlier, for such cases, it is better to use 'streaming' feature
> released as part of PG-14 but anyway here we are trying to debug
> timeout problem.
>
> > I will recompile postgres with your patch to debug.
> >
>
> Okay, that might help.
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-09-17 Thread Amit Kapila

On Fri, Sep 17, 2021 at 3:29 PM Fabrice Chapuis  wrote:
>
> Hi,
>
> Logical replication is configured on one instance in version 10.18. Timeout 
> errors occur regularly and the worker process exit with an exit code 1
>
> 2021-09-16 12:06:50 CEST [24881]: [1-1] user=postgres,db=foo,client=[local] 
> LOG:  duration: 1281408.171 ms  statement: COPY schem.tab (col1, col2) FROM 
> stdin;
> 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic 
> analyze of table "foo.schem.tab" system usage: CPU: user: 4.13 s, system: 
> 0.55 s, elapsed: 9.58 s
> 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:  terminating 
> logical replication worker due to timeout
> 2021-09-16 12:07:50 CEST [12546]: [11-1] user=,db=,client= LOG:  worker 
> process: logical replication worker for subscription 24106654 (PID 3770) 
> exited with exit code 1
> 2021-09-16 12:07:50 CEST [13872]: [1-1] user=,db=,client= LOG:  logical 
> replication apply worker for subscription "sub" has started
> 2021-09-16 12:07:50 CEST [13873]: [1-1] user=repuser,db=foo,client=127.0.0.1 
> LOG:  received replication command: IDENTIFY_SYSTEM
>

Can you share the publisher-side log as well?


-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-17 Thread Fabrice Chapuis

the publisher and the subscriber run on the same postgres instance.

Regards,
Fabrice

On Fri, Sep 17, 2021 at 12:26 PM Amit Kapila 
wrote:

> On Fri, Sep 17, 2021 at 3:29 PM Fabrice Chapuis 
> wrote:
> >
> > Hi,
> >
> > Logical replication is configured on one instance in version 10.18.
> Timeout errors occur regularly and the worker process exit with an exit
> code 1
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1]
> user=postgres,db=foo,client=[local] LOG:  duration: 1281408.171 ms
> statement: COPY schem.tab (col1, col2) FROM stdin;
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:
> automatic analyze of table "foo.schem.tab" system usage: CPU: user: 4.13 s,
> system: 0.55 s, elapsed: 9.58 s
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
> terminating logical replication worker due to timeout
> > 2021-09-16 12:07:50 CEST [12546]: [11-1] user=,db=,client= LOG:  worker
> process: logical replication worker for subscription 24106654 (PID 3770)
> exited with exit code 1
> > 2021-09-16 12:07:50 CEST [13872]: [1-1] user=,db=,client= LOG:  logical
> replication apply worker for subscription "sub" has started
> > 2021-09-16 12:07:50 CEST [13873]: [1-1]
> user=repuser,db=foo,client=127.0.0.1 LOG:  received replication command:
> IDENTIFY_SYSTEM
> >
>
> Can you share the publisher-side log as well?
>
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-09-18 Thread Amit Kapila

On Fri, Sep 17, 2021 at 8:08 PM Fabrice Chapuis  wrote:
>
> the publisher and the subscriber run on the same postgres instance.
>

Okay, but there is no log corresponding to operations being performed
by the publisher. By looking at current logs it is not very clear to
me what might have caused this. Did you try increasing
wal_sender_timeout and wal_receiver_timeout?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-20 Thread Fabrice Chapuis

Hi Amit,

We can replay the problem: we load a table of several Gb in the schema of
the publisher, this generates the worker's timeout after one minute from
the end of this load. The table on which this load is executed is not
replicated.

2021-09-16 12:06:50 CEST [24881]: [1-1]
user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms
statement: COPY db.table (col1, col2) FROM stdin;

2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic
analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55
s, elapsed: 9.58 s

2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
terminating logical replication worker due to timeout

Before increasing value for wal_sender_timeout and wal_receiver_timeout I
thought to further investigate the mechanisms leading to this timeout.

Thanks for your help

Fabrice

On Sun, Sep 19, 2021 at 6:25 AM Amit Kapila  wrote:

> On Fri, Sep 17, 2021 at 8:08 PM Fabrice Chapuis 
> wrote:
> >
> > the publisher and the subscriber run on the same postgres instance.
> >
>
> Okay, but there is no log corresponding to operations being performed
> by the publisher. By looking at current logs it is not very clear to
> me what might have caused this. Did you try increasing
> wal_sender_timeout and wal_receiver_timeout?
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-09-20 Thread Amit Kapila

On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis  wrote:
>
> Hi Amit,
>
> We can replay the problem: we load a table of several Gb in the schema of the 
> publisher, this generates the worker's timeout after one minute from the end 
> of this load. The table on which this load is executed is not replicated.
>
> 2021-09-16 12:06:50 CEST [24881]: [1-1] 
> user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms  
> statement: COPY db.table (col1, col2) FROM stdin;
>
> 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic 
> analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55 s, 
> elapsed: 9.58 s
>
> 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:  terminating 
> logical replication worker due to timeout
>
> Before increasing value for wal_sender_timeout and wal_receiver_timeout I 
> thought to further investigate the mechanisms leading to this timeout.
>

The basic problem here seems to be that WAL Sender is not able to send
a keepalive or any other message for the configured
wal_receiver_timeout. I am not sure how that can happen but can you
once try by switching autovacuum = off? I wanted to ensure that
WALSender is not blocked due to the background process autovacuum.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-20 Thread Amit Kapila

On Mon, Sep 20, 2021 at 5:21 PM Amit Kapila  wrote:
>
> On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis  
> wrote:
> >
> > Hi Amit,
> >
> > We can replay the problem: we load a table of several Gb in the schema of 
> > the publisher, this generates the worker's timeout after one minute from 
> > the end of this load. The table on which this load is executed is not 
> > replicated.
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1] 
> > user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms  
> > statement: COPY db.table (col1, col2) FROM stdin;
> >
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic 
> > analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55 
> > s, elapsed: 9.58 s
> >
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:  
> > terminating logical replication worker due to timeout
> >
> > Before increasing value for wal_sender_timeout and wal_receiver_timeout I 
> > thought to further investigate the mechanisms leading to this timeout.
> >
>
> The basic problem here seems to be that WAL Sender is not able to send
> a keepalive or any other message for the configured
> wal_receiver_timeout. I am not sure how that can happen but can you
> once try by switching autovacuum = off? I wanted to ensure that
> WALSender is not blocked due to the background process autovacuum.
>

The other thing we can try out is to check the data in pg_locks on
publisher during one minute after the large copy is finished. This we
can try out both with and without autovacuum.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-20 Thread Fabrice Chapuis

By passing the autovacuum parameter to off the problem did not occur right
after loading the table as in our previous tests. However, the timeout
occurred later. We have seen the accumulation of .snap files for several Gb.

...
-rw---. 1 postgres postgres 16791226 Sep 20 15:26
xid-1238444701-lsn-2D2B-F500.snap
-rw---. 1 postgres postgres 16973268 Sep 20 15:26
xid-1238444701-lsn-2D2B-F600.snap
-rw---. 1 postgres postgres 16790984 Sep 20 15:26
xid-1238444701-lsn-2D2B-F700.snap
-rw---. 1 postgres postgres 16988112 Sep 20 15:26
xid-1238444701-lsn-2D2B-F800.snap
-rw---. 1 postgres postgres 16864593 Sep 20 15:26
xid-1238444701-lsn-2D2B-F900.snap
-rw---. 1 postgres postgres 16902167 Sep 20 15:26
xid-1238444701-lsn-2D2B-FA00.snap
-rw---. 1 postgres postgres 16914638 Sep 20 15:26
xid-1238444701-lsn-2D2B-FB00.snap
-rw---. 1 postgres postgres 16782471 Sep 20 15:26
xid-1238444701-lsn-2D2B-FC00.snap
-rw---. 1 postgres postgres 16963667 Sep 20 15:27
xid-1238444701-lsn-2D2B-FD00.snap
...



2021-09-20 17:11:29 CEST [12687]: [1283-1] user=,db=,client= LOG:
checkpoint starting: time
2021-09-20 17:11:31 CEST [12687]: [1284-1] user=,db=,client= LOG:
checkpoint complete: wrote 13 buffers (0.0%); 0 WAL file(s) added, 0
removed, 0 recycled; write=1.713 s, sync=0.001 s, total=1.718 s
; sync files=12, longest=0.001 s, average=0.001 s; distance=29 kB,
estimate=352191 kB
2021-09-20 17:12:43 CEST [59986]: [2-1] user=,db=,client= ERROR:
terminating logical replication worker due to timeout
2021-09-20 17:12:43 CEST [12546]: [1068-1] user=,db=,client= LOG:  worker
process: logical replication worker for subscription 24215702 (PID 59986)
exited with exit code 1
2021-09-20 17:12:43 CEST [39945]: [1-1] user=,db=,client= LOG:  logical
replication apply worker for subscription "sub" has started
2021-09-20 17:12:43 CEST [39946]: [1-1] user=repuser,db=db,client=127.0.0.1
LOG:  received replication command: IDENTIFY_SYSTEM

Regards,

Fabrice



On Mon, Sep 20, 2021 at 1:51 PM Amit Kapila  wrote:

> On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis 
> wrote:
> >
> > Hi Amit,
> >
> > We can replay the problem: we load a table of several Gb in the schema
> of the publisher, this generates the worker's timeout after one minute from
> the end of this load. The table on which this load is executed is not
> replicated.
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1]
> user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms
> statement: COPY db.table (col1, col2) FROM stdin;
> >
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:
> automatic analyze of table "db.table " system usage: CPU: user: 4.13 s,
> system: 0.55 s, elapsed: 9.58 s
> >
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
> terminating logical replication worker due to timeout
> >
> > Before increasing value for wal_sender_timeout and wal_receiver_timeout
> I thought to further investigate the mechanisms leading to this timeout.
> >
>
> The basic problem here seems to be that WAL Sender is not able to send
> a keepalive or any other message for the configured
> wal_receiver_timeout. I am not sure how that can happen but can you
> once try by switching autovacuum = off? I wanted to ensure that
> WALSender is not blocked due to the background process autovacuum.
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-09-20 Thread Amit Kapila

On Mon, Sep 20, 2021 at 9:43 PM Fabrice Chapuis  wrote:
>
> By passing the autovacuum parameter to off the problem did not occur right 
> after loading the table as in our previous tests. However, the timeout 
> occurred later. We have seen the accumulation of .snap files for several Gb.
>
> ...
> -rw---. 1 postgres postgres 16791226 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F500.snap
> -rw---. 1 postgres postgres 16973268 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F600.snap
> -rw---. 1 postgres postgres 16790984 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F700.snap
> -rw---. 1 postgres postgres 16988112 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F800.snap
> -rw---. 1 postgres postgres 16864593 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F900.snap
> -rw---. 1 postgres postgres 16902167 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FA00.snap
> -rw---. 1 postgres postgres 16914638 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FB00.snap
> -rw---. 1 postgres postgres 16782471 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FC00.snap
> -rw---. 1 postgres postgres 16963667 Sep 20 15:27 
> xid-1238444701-lsn-2D2B-FD00.snap
> ...
>

Okay, still not sure why the publisher is not sending keep_alive
messages in between spilling such a big transaction. If you see, we
have logic in WalSndLoop() wherein each time after sending data we
check whether we need to send a keep-alive message via function
WalSndKeepaliveIfNecessary(). I think to debug this problem further
you need to add some logs in function WalSndKeepaliveIfNecessary() to
see why it is not sending keep_alive messages when all these files are
being created.

Did you change the default value of
wal_sender_timeout/wal_receiver_timeout? What is the value of those
variables in your environment? Did you see the message "terminating
walsender process due to replication timeout" in your server logs?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-21 Thread Fabrice Chapuis

If I understand, the instruction to send keep alive by the wal sender has
not been reached in the for loop, for what reason?
...
* Check for replication timeout. */
  WalSndCheckTimeOut();

/* Send keepalive if the time has come */
  WalSndKeepaliveIfNecessary();
...

The data load is performed on a table which is not replicated, I do not
understand why the whole transaction linked to an insert is copied to snap
files given that table does not take part of the logical replication.
We are going to do a test by modifying parameters
wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem is that
these parameters are global and changing them will also impact the physical
replication.

Concerning the walsender timeout, when the worker is started again after a
timeout, it will trigger a new walsender associated with it.

postgres 55680 12546  0 Sep20 ?00:00:02 postgres: aq: bgworker:
logical replication worker for subscription 24651602
postgres 55681 12546  0 Sep20 ?00:00:00 postgres: aq: wal sender
process repuser 127.0.0.1(57930) idle

Kind Regards

Fabrice

On Tue, Sep 21, 2021 at 8:38 AM Amit Kapila  wrote:

> On Mon, Sep 20, 2021 at 9:43 PM Fabrice Chapuis 
> wrote:
> >
> > By passing the autovacuum parameter to off the problem did not occur
> right after loading the table as in our previous tests. However, the
> timeout occurred later. We have seen the accumulation of .snap files for
> several Gb.
> >
> > ...
> > -rw---. 1 postgres postgres 16791226 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F500.snap
> > -rw---. 1 postgres postgres 16973268 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F600.snap
> > -rw---. 1 postgres postgres 16790984 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F700.snap
> > -rw---. 1 postgres postgres 16988112 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F800.snap
> > -rw---. 1 postgres postgres 16864593 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F900.snap
> > -rw---. 1 postgres postgres 16902167 Sep 20 15:26
> xid-1238444701-lsn-2D2B-FA00.snap
> > -rw---. 1 postgres postgres 16914638 Sep 20 15:26
> xid-1238444701-lsn-2D2B-FB00.snap
> > -rw---. 1 postgres postgres 16782471 Sep 20 15:26
> xid-1238444701-lsn-2D2B-FC00.snap
> > -rw---. 1 postgres postgres 16963667 Sep 20 15:27
> xid-1238444701-lsn-2D2B-FD00.snap
> > ...
> >
>
> Okay, still not sure why the publisher is not sending keep_alive
> messages in between spilling such a big transaction. If you see, we
> have logic in WalSndLoop() wherein each time after sending data we
> check whether we need to send a keep-alive message via function
> WalSndKeepaliveIfNecessary(). I think to debug this problem further
> you need to add some logs in function WalSndKeepaliveIfNecessary() to
> see why it is not sending keep_alive messages when all these files are
> being created.
>
> Did you change the default value of
> wal_sender_timeout/wal_receiver_timeout? What is the value of those
> variables in your environment? Did you see the message "terminating
> walsender process due to replication timeout" in your server logs?
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-09-21 Thread Amit Kapila

On Tue, Sep 21, 2021 at 1:52 PM Fabrice Chapuis  wrote:
>
> If I understand, the instruction to send keep alive by the wal sender has not 
> been reached in the for loop, for what reason?
> ...
> * Check for replication timeout. */
>   WalSndCheckTimeOut();
>
> /* Send keepalive if the time has come */
>   WalSndKeepaliveIfNecessary();
> ...
>

Are you sure that these functions have not been called? Or the case is
that these are called but due to some reason the keep-alive is not
sent? IIUC, these are called after processing each WAL record so not
sure how is it possible in your case that these are not reached?

> The data load is performed on a table which is not replicated, I do not 
> understand why the whole transaction linked to an insert is copied to snap 
> files given that table does not take part of the logical replication.
>

It is because we don't know till the end of the transaction (where we
start sending the data) whether the table will be replicated or not. I
think specifically for this purpose the new 'streaming' feature
introduced in PG-14 will help us to avoid writing data of such tables
to snap/spill files. See 'streaming' option in Create Subscription
docs [1].

> We are going to do a test by modifying parameters 
> wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem is that 
> these parameters are global and changing them will also impact the physical 
> replication.
>

Do you mean you are planning to change from 1 minute to 5 minutes? I
agree with the global nature of parameters and I think your approach
to finding out the root cause is good here because otherwise, under
some similar or more heavy workload, it might lead to the same
situation.

> Concerning the walsender timeout, when the worker is started again after a 
> timeout, it will trigger a new walsender associated with it.
>

Right, I know that but I was curious to know if the walsender has
exited before walreceiver.

[1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-21 Thread Fabrice Chapuis

> IIUC, these are called after processing each WAL record so not
sure how is it possible in your case that these are not reached?

I don't know, as you say, to highlight the problem we would have to debug
the WalSndKeepaliveIfNecessary function

> I was curious to know if the walsender has exited before walreceiver

During the last tests we made we didn't observe any timeout of the wal
sender process.

> Do you mean you are planning to change from 1 minute to 5 minutes?

We set wal_sender_timeout/wal_receiver_timeout to 5' and launch new test.
The result is surprising and rather positive there is no timeout any more
in the log and the 20Gb of snap files are removed in less than 5 minutes.
How to explain that behaviour, why the snap files are consumed suddenly so
quickly.
I choose the value arbitrarily for wal_sender_timeout/wal_receiver_timeout
parameters, are theses values appropriate from your point of view?

Best Regards

Fabrice



On Tue, Sep 21, 2021 at 11:52 AM Amit Kapila 
wrote:

> On Tue, Sep 21, 2021 at 1:52 PM Fabrice Chapuis 
> wrote:
> >
> > If I understand, the instruction to send keep alive by the wal sender
> has not been reached in the for loop, for what reason?
> > ...
> > * Check for replication timeout. */
> >   WalSndCheckTimeOut();
> >
> > /* Send keepalive if the time has come */
> >   WalSndKeepaliveIfNecessary();
> > ...
> >
>
> Are you sure that these functions have not been called? Or the case is
> that these are called but due to some reason the keep-alive is not
> sent? IIUC, these are called after processing each WAL record so not
> sure how is it possible in your case that these are not reached?
>
> > The data load is performed on a table which is not replicated, I do not
> understand why the whole transaction linked to an insert is copied to snap
> files given that table does not take part of the logical replication.
> >
>
> It is because we don't know till the end of the transaction (where we
> start sending the data) whether the table will be replicated or not. I
> think specifically for this purpose the new 'streaming' feature
> introduced in PG-14 will help us to avoid writing data of such tables
> to snap/spill files. See 'streaming' option in Create Subscription
> docs [1].
>
> > We are going to do a test by modifying parameters
> wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem is that
> these parameters are global and changing them will also impact the physical
> replication.
> >
>
> Do you mean you are planning to change from 1 minute to 5 minutes? I
> agree with the global nature of parameters and I think your approach
> to finding out the root cause is good here because otherwise, under
> some similar or more heavy workload, it might lead to the same
> situation.
>
> > Concerning the walsender timeout, when the worker is started again after
> a timeout, it will trigger a new walsender associated with it.
> >
>
> Right, I know that but I was curious to know if the walsender has
> exited before walreceiver.
>
> [1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-09-22 Thread Amit Kapila

On Tue, Sep 21, 2021 at 9:12 PM Fabrice Chapuis  wrote:
>
> > IIUC, these are called after processing each WAL record so not
> sure how is it possible in your case that these are not reached?
>
> I don't know, as you say, to highlight the problem we would have to debug the 
> WalSndKeepaliveIfNecessary function
>
> > I was curious to know if the walsender has exited before walreceiver
>
> During the last tests we made we didn't observe any timeout of the wal sender 
> process.
>
> > Do you mean you are planning to change from 1 minute to 5 minutes?
>
> We set wal_sender_timeout/wal_receiver_timeout to 5' and launch new test. The 
> result is surprising and rather positive there is no timeout any more in the 
> log and the 20Gb of snap files are removed in less than 5 minutes.
> How to explain that behaviour, why the snap files are consumed suddenly so 
> quickly.
>

I think it is because we decide that the data in those snap files
doesn't need to be sent at xact end, so we remove them.

> I choose the value arbitrarily for wal_sender_timeout/wal_receiver_timeout 
> parameters, are theses values appropriate from your point of view?
>

It is difficult to say what is the appropriate value for these
parameters unless in some way we debug WalSndKeepaliveIfNecessary() to
find why it didn't send keep alive when it is expected. Would you be
able to make code changes and test or if you want I can make changes
and send the patch if you can test it? If not, is it possible that in
some way you send a reproducible test?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-22 Thread Fabrice Chapuis

If you would like I can test the patch you send to me.

Regards

Fabrice

On Wed, Sep 22, 2021 at 11:02 AM Amit Kapila 
wrote:

> On Tue, Sep 21, 2021 at 9:12 PM Fabrice Chapuis 
> wrote:
> >
> > > IIUC, these are called after processing each WAL record so not
> > sure how is it possible in your case that these are not reached?
> >
> > I don't know, as you say, to highlight the problem we would have to
> debug the WalSndKeepaliveIfNecessary function
> >
> > > I was curious to know if the walsender has exited before walreceiver
> >
> > During the last tests we made we didn't observe any timeout of the wal
> sender process.
> >
> > > Do you mean you are planning to change from 1 minute to 5 minutes?
> >
> > We set wal_sender_timeout/wal_receiver_timeout to 5' and launch new
> test. The result is surprising and rather positive there is no timeout any
> more in the log and the 20Gb of snap files are removed in less than 5
> minutes.
> > How to explain that behaviour, why the snap files are consumed suddenly
> so quickly.
> >
>
> I think it is because we decide that the data in those snap files
> doesn't need to be sent at xact end, so we remove them.
>
> > I choose the value arbitrarily for
> wal_sender_timeout/wal_receiver_timeout parameters, are theses values
> appropriate from your point of view?
> >
>
> It is difficult to say what is the appropriate value for these
> parameters unless in some way we debug WalSndKeepaliveIfNecessary() to
> find why it didn't send keep alive when it is expected. Would you be
> able to make code changes and test or if you want I can make changes
> and send the patch if you can test it? If not, is it possible that in
> some way you send a reproducible test?
>
> --
> With Regards,
> Amit Kapila.
>

Re: Logical replication timeout problem

2021-09-23 Thread Amit Kapila

On Wed, Sep 22, 2021 at 9:46 PM Fabrice Chapuis  wrote:
>
> If you would like I can test the patch you send to me.
>

Okay, please find an attached patch for additional logs. I would like
to see the logs during the time when walsender appears to be writing
to files. We might need to add more logs to find the exact problem but
let's start with this.

-- 
With Regards,
Amit Kapila.

log_keep_alive_1.patch
Description: Binary data

Re: Logical replication timeout problem

2021-09-23 Thread Fabrice Chapuis

Thanks for your patch, we are going to set up a lab in order to debug the
function.
Regards
Fabrice

On Thu, Sep 23, 2021 at 3:50 PM Amit Kapila  wrote:

> On Wed, Sep 22, 2021 at 9:46 PM Fabrice Chapuis 
> wrote:
> >
> > If you would like I can test the patch you send to me.
> >
>
> Okay, please find an attached patch for additional logs. I would like
> to see the logs during the time when walsender appears to be writing
> to files. We might need to add more logs to find the exact problem but
> let's start with this.
>
> --
> With Regards,
> Amit Kapila.
>

RE: Logical replication timeout problem

2021-09-30 Thread Tang , Haiying/唐海英

On Friday, September 24, 2021 12:04 AM, Fabrice Chapuis 
mailto:fabrice636...@gmail.com>> wrote:

>

> Thanks for your patch, we are going to set up a lab in order to debug the 
> function.

Hi

I tried to reproduce this timeout problem on version10.18 but failed.

In my trial, I inserted large amounts of data at publisher, which took more 
than 1 minute to replicate.

And with the patch provided by Amit, I saw that the frequency of invoking

WalSndKeepaliveIfNecessary function is raised after I inserted data.

The test script is attached. Maybe you can try it on your machine and check if 
this problem could happen.

If I miss something in the script, please let me know.

Of course, it will be better if you can provide your script to reproduce the 
problem.

Regards

Tang

run.sh
Description: run.sh

Re: Logical replication timeout problem

2021-10-08 Thread Fabrice Chapuis

Thanks Tang for your script.
Our debugging environment will be ready soon. I will test your script and
we will try to reproduce the problem by integrating the patch provided by
Amit. As soon as I have results I will let you know.

Regards

Fabrice

On Thu, Sep 30, 2021 at 3:15 AM Tang, Haiying/唐 海英 
wrote:

> On Friday, September 24, 2021 12:04 AM, Fabrice Chapuis <
> fabrice636...@gmail.com> wrote:
>
> >
>
> > Thanks for your patch, we are going to set up a lab in order to debug
> the function.
>
>
>
> Hi
>
>
>
> I tried to reproduce this timeout problem on version10.18 but failed.
>
> In my trial, I inserted large amounts of data at publisher, which took
> more than 1 minute to replicate.
>
> And with the patch provided by Amit, I saw that the frequency of invoking
>
> WalSndKeepaliveIfNecessary function is raised after I inserted data.
>
>
>
> The test script is attached. Maybe you can try it on your machine and
> check if this problem could happen.
>
> If I miss something in the script, please let me know.
>
> Of course, it will be better if you can provide your script to reproduce
> the problem.
>
>
>
> Regards
>
> Tang
>
>

RE: Logical replication timeout problem

2022-03-17 Thread wangw.f...@fujitsu.com

On Thu, Mar 17, 2022 at 7:52 PM Masahiko Sawada  wrote:
>
Thanks for your comments.

> On Thu, Mar 17, 2022 at 7:14 PM Amit Kapila  wrote:
> >
> > On Thu, Mar 17, 2022 at 12:27 PM Amit Kapila 
> wrote:
> > >
> > > On Wed, Mar 16, 2022 at 7:38 PM Masahiko Sawada
>  wrote:
> > > >
> > > > After more thought, can we check only wal_sender_timeout without
> > > > skip-count? That is, in WalSndUpdateProgress(), if we have
> > > > received any reply from the subscriber in last (wal_sender_timeout
> > > > / 2), we don't need to do anything in terms of keep-alive. If not,
> > > > we do
> > > > ProcessRepliesIfAny() (and probably WalSndCheckTimeOut()?) then
> > > > WalSndKeepalivesIfNecessary(). That way, we can send keep-alive
> > > > messages every (wal_sender_timeout / 2). And since we don't call
> > > > them for every change, we would not need to worry about the overhead
> much.
> > > >
> > >
> > > But won't that lead to a call to GetCurrentTimestamp() for each
> > > change we skip? IIUC from previous replies that lead to a slight
> > > slowdown in previous tests of Wang-San.
> > >
> > If the above is true then I think we can use a lower skip_count say 10
> > along with a timeout mechanism to send keepalive message. This will
> > help us to alleviate the overhead Wang-San has shown.
> 
> Using both sounds reasonable to me. I'd like to see how much the overhead is
> alleviated by using skip_count 10 (or 100).
> 
> > BTW, I think there could be one other advantage of using
> > ProcessRepliesIfAny() (as you are suggesting) is that it can help to
> > release sync waiters if there are any. I feel that would be the case
> > for the skip_empty_transactions patch [1] which uses
> > WalSndUpdateProgress to send keepalive messages after skipping empty
> > transactions.
> 
> +1
I modified the patch according to your and Amit-San's suggestions.
In addition, after testing, I found that when the threshold is 10, it brings
slight overhead.
So I try to change it to 100, after testing, the results look good to me.
10  : 1.22%--UpdateProgress
100 : 0.16%--UpdateProgress

Please refer to attachment.

Attach the new patch.
1. Refactor the way to send keepalive messages.
   [suggestion by Sawada-San, Amit-San.]
2. Modify the value of flag is_send initialization to make it look more
   reasonable. [suggestion by Kuroda-San.]
3. Improve new function names.
   (From SendKeepaliveIfNecessary to UpdateProgress.)

Regards,
Wang wei


v3-0001-Fix-the-timeout-of-subscriber-in-long-transaction.patch
Description:  v3-0001-Fix-the-timeout-of-subscriber-in-long-transaction.patch

RE: Logical replication timeout problem

2022-03-17 Thread wangw.f...@fujitsu.com

On Thu, Mar 9, 2022 at 11:52 AM Kuroda, Hayato/黒田 隼人 
 wrote:
> Thank you for updating!
Thanks for your comments.

> 1. pgoutput_change
> ```
> +   bool is_send = true;
> ```
> 
> My first impression is that is_send should be initialized to false, and it 
> will change
> to true when OutputPluginWrite() is called.
> 
> 
> 2. pgoutput_change
> ```
> +   {
> +   is_send = false;
> +   break;
> +   }
> ```
> 
> Here are too many indents, but I think they should be removed.
> See above comment.
Fixed. Initialize is_send to false.

> 3. WalSndUpdateProgress
> ```
> +   /*
> +* If half of wal_sender_timeout has lapsed without send 
> message
> standby,
> +* send a keep-alive message to the standby.
> +*/
> ```
> 
> The comment seems inconsistency with others.
> Here is "keep-alive", but other parts are "keepalive".
Since this part of the code was refactored, this inconsistent comment was
removed.

> 4. ReorderBufferProcessTXN
> ```
> + 
>   change-
> >data.inval.ninvalidations,
> +
> + change->data.inval.invalidations);
> ```
> 
> Maybe these lines break 80-columns rule.
Thanks for reminder. I will run pg_ident later.

Kindly have a look at new patch shared in [1].

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275C67F14954E05CE5D04399E139%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

Re: Logical replication timeout problem

2022-03-18 Thread Amit Kapila

On Fri, Mar 18, 2022 at 10:43 AM wangw.f...@fujitsu.com
 wrote:
>
> On Thu, Mar 17, 2022 at 7:52 PM Masahiko Sawada  wrote:
> >
>
> Attach the new patch.
>

*
  case REORDER_BUFFER_CHANGE_INVALIDATION:
- /* Execute the invalidation messages locally */
- ReorderBufferExecuteInvalidations(
-   change->data.inval.ninvalidations,
-   change->data.inval.invalidations);
- break;
+ {
+ LogicalDecodingContext *ctx = rb->private_data;
+
+ /* Try to send a keepalive message. */
+ UpdateProgress(ctx, true);

Calling UpdateProgress() here appears adhoc to me especially because
it calls OutputPluginUpdateProgress which appears to be called only
from plugin API. Am, I missing something? Also why the same handling
is missed in other similar messages like
REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID where we don't call any
plug-in API?

I am not sure what is a good way to achieve this but one idea that
occurred to me was shall we invent a new callback
ReorderBufferSkipChangeCB similar to ReorderBufferApplyChangeCB and
then pgoutput can register its API where we can have the logic similar
to what you have in UpdateProgress()? If we do so, then all the
cuurent callers of UpdateProgress in pgoutput can also call that API.
What do you think?

* Why don't you have a quick exit like below code in WalSndWriteData?
/* Try taking fast path unless we get too close to walsender timeout. */
if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
  wal_sender_timeout / 2) &&
!pq_is_send_pending())
{
return;
}

*  Can we rename variable 'is_send' to 'change_sent'?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-03-20 Thread Amit Kapila

On Fri, Mar 18, 2022 at 4:20 PM Amit Kapila  wrote:
>
> On Fri, Mar 18, 2022 at 10:43 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Thu, Mar 17, 2022 at 7:52 PM Masahiko Sawada  
> > wrote:
> > >
> >
> > Attach the new patch.
> >
>
> *
>   case REORDER_BUFFER_CHANGE_INVALIDATION:
> - /* Execute the invalidation messages locally */
> - ReorderBufferExecuteInvalidations(
> -   change->data.inval.ninvalidations,
> -   change->data.inval.invalidations);
> - break;
> + {
> + LogicalDecodingContext *ctx = rb->private_data;
> +
> + /* Try to send a keepalive message. */
> + UpdateProgress(ctx, true);
>
> Calling UpdateProgress() here appears adhoc to me especially because
> it calls OutputPluginUpdateProgress which appears to be called only
> from plugin API. Am, I missing something? Also why the same handling
> is missed in other similar messages like
> REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID where we don't call any
> plug-in API?
>
> I am not sure what is a good way to achieve this but one idea that
> occurred to me was shall we invent a new callback
> ReorderBufferSkipChangeCB similar to ReorderBufferApplyChangeCB and
> then pgoutput can register its API where we can have the logic similar
> to what you have in UpdateProgress()? If we do so, then all the
> cuurent callers of UpdateProgress in pgoutput can also call that API.
> What do you think?
>

Another idea could be that we leave the DDL case for now as anyway
there is very less chance of timeout for skipping DDLs and we may
later need to even backpatch this bug-fix which would be another
reason to not make such invasive changes. We can handle the DDL case
if required separately.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-03-21 Thread wangw.f...@fujitsu.com

On Mon, Mar 21, 2022 at 1:31 PM Amit Kapila  wrote:
>
Thanks for your comments.

> On Fri, Mar 18, 2022 at 4:20 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 18, 2022 at 10:43 AM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > On Thu, Mar 17, 2022 at 7:52 PM Masahiko Sawada
>  wrote:
> > > >
> > >
> > > Attach the new patch.
> > >
> >
> > *
> >   case REORDER_BUFFER_CHANGE_INVALIDATION:
> > - /* Execute the invalidation messages locally */
> > - ReorderBufferExecuteInvalidations(
> > -   change->data.inval.ninvalidations,
> > -   change->data.inval.invalidations);
> > - break;
> > + {
> > + LogicalDecodingContext *ctx = rb->private_data;
> > +
> > + /* Try to send a keepalive message. */
> > + UpdateProgress(ctx, true);
> >
> > Calling UpdateProgress() here appears adhoc to me especially because
> > it calls OutputPluginUpdateProgress which appears to be called only
> > from plugin API. Am, I missing something? Also why the same handling
> > is missed in other similar messages like
> > REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID where we don't call
> any
> > plug-in API?
Yes, you are right.
And I invoke in case REORDER_BUFFER_CHANGE_INVALIDATION because I think every
DDL will modify the catalog then get into this case. So I only invoke function
UpdateProgress here to handle DDL.

> > I am not sure what is a good way to achieve this but one idea that
> > occurred to me was shall we invent a new callback
> > ReorderBufferSkipChangeCB similar to ReorderBufferApplyChangeCB and
> > then pgoutput can register its API where we can have the logic similar
> > to what you have in UpdateProgress()? If we do so, then all the
> > cuurent callers of UpdateProgress in pgoutput can also call that API.
> > What do you think?
> >
> Another idea could be that we leave the DDL case for now as anyway
> there is very less chance of timeout for skipping DDLs and we may
> later need to even backpatch this bug-fix which would be another
> reason to not make such invasive changes. We can handle the DDL case
> if required separately.
Yes, I think a new callback function would be nice.
Yes, as you said, maybe we could fix the usecase that found the problem in the
first place. Then make further modifications on the master branch.
Modify the patch. Currently only DML related code remains.

> > * Why don't you have a quick exit like below code in WalSndWriteData?
> > /* Try taking fast path unless we get too close to walsender timeout. */ if 
> > (now
> > < TimestampTzPlusMilliseconds(last_reply_timestamp,
> >   wal_sender_timeout / 2) &&
> > !pq_is_send_pending())
> > {
> > return;
> > }
Fixed. I missed this so adding it in the new patch.

> > *  Can we rename variable 'is_send' to 'change_sent'?
Improve the the name of this variable.(From 'is_send' to 'change_sent')

Attach the new patch. [suggestion by Amit-San.]
1. Remove DDL related code. Handle the DDL case later separately if need.
2. Fix a missing.(In function WalSndUpdateProgress)
3. Improve variable names. (From 'is_send' to 'change_sent')
4. Fix some comments.(Above and inside the function WalSndUpdateProgress.)

Regards,
Wang wei


v4-0001-Fix-the-timeout-of-subscriber-in-long-transaction.patch
Description:  v4-0001-Fix-the-timeout-of-subscriber-in-long-transaction.patch

Re: Logical replication timeout problem

2022-03-24 Thread Amit Kapila

On Tue, Mar 22, 2022 at 7:25 AM wangw.f...@fujitsu.com
 wrote:
>
> Attach the new patch.
>

It seems by mistake you have removed the changes from pgoutput_message
and pgoutput_truncate functions. I have added those back.
Additionally, I made a few other changes: (a) moved the function
UpdateProgress to pgoutput.c as it is not used outside it, (b) change
the new parameter in plugin API from 'send_keep_alive' to 'last_write'
to make it look similar to WalSndPrepareWrite and WalSndWriteData, (c)
made a number of changes in WalSndUpdateProgress API, it is better to
move keep-alive code after lag track code because we do process
replies at that time and there it will compute the lag; (d)
changed/added comments in the code.

Do let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.

v5-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description: Binary data

RE: Logical replication timeout problem

2022-03-24 Thread kuroda.hay...@fujitsu.com

Dear Amit,

> It seems by mistake you have removed the changes from pgoutput_message
> and pgoutput_truncate functions. I have added those back.
> Additionally, I made a few other changes: (a) moved the function
> UpdateProgress to pgoutput.c as it is not used outside it, (b) change
> the new parameter in plugin API from 'send_keep_alive' to 'last_write'
> to make it look similar to WalSndPrepareWrite and WalSndWriteData, (c)
> made a number of changes in WalSndUpdateProgress API, it is better to
> move keep-alive code after lag track code because we do process
> replies at that time and there it will compute the lag; (d)
> changed/added comments in the code.

LGTM, but the patch cannot be applied to current HEAD.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: Logical replication timeout problem

2022-03-24 Thread wangw.f...@fujitsu.com

On Thur, Mar 24, 2022 at 6:32 PM Amit Kapila  wrote:
>
Thanks for your kindly update.

> It seems by mistake you have removed the changes from pgoutput_message
> and pgoutput_truncate functions. I have added those back.
> Additionally, I made a few other changes: (a) moved the function
> UpdateProgress to pgoutput.c as it is not used outside it, (b) change
> the new parameter in plugin API from 'send_keep_alive' to 'last_write'
> to make it look similar to WalSndPrepareWrite and WalSndWriteData, (c)
> made a number of changes in WalSndUpdateProgress API, it is better to
> move keep-alive code after lag track code because we do process
> replies at that time and there it will compute the lag; (d)
> changed/added comments in the code.
> 
> Do let me know what you think of the attached?
It looks good to me. Just rebase it because the change in header(75b1521).
I tested it and the result looks good to me.

Attach the new patch.

Regards,
Wang wei


v6-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  v6-0001-Fix-the-logical-replication-timeout-during-large-.patch

Re: Logical replication timeout problem

2022-03-24 Thread Masahiko Sawada

On Fri, Mar 25, 2022 at 2:23 PM wangw.f...@fujitsu.com
 wrote:
>
> On Thur, Mar 24, 2022 at 6:32 PM Amit Kapila  wrote:
> >
> Thanks for your kindly update.
>
> > It seems by mistake you have removed the changes from pgoutput_message
> > and pgoutput_truncate functions. I have added those back.
> > Additionally, I made a few other changes: (a) moved the function
> > UpdateProgress to pgoutput.c as it is not used outside it, (b) change
> > the new parameter in plugin API from 'send_keep_alive' to 'last_write'
> > to make it look similar to WalSndPrepareWrite and WalSndWriteData, (c)
> > made a number of changes in WalSndUpdateProgress API, it is better to
> > move keep-alive code after lag track code because we do process
> > replies at that time and there it will compute the lag; (d)
> > changed/added comments in the code.
> >
> > Do let me know what you think of the attached?
> It looks good to me. Just rebase it because the change in header(75b1521).
> I tested it and the result looks good to me.

Since commit 75b1521 added decoding of sequence to logical
replication, the patch needs to have pgoutput_sequence() call
update_progress().

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-03-25 Thread Amit Kapila

On Fri, Mar 25, 2022 at 11:49 AM Masahiko Sawada  wrote:
>
> On Fri, Mar 25, 2022 at 2:23 PM wangw.f...@fujitsu.com
>  wrote:
>
> Since commit 75b1521 added decoding of sequence to logical
> replication, the patch needs to have pgoutput_sequence() call
> update_progress().
>

Yeah, I also think this needs to be addressed. But apart from this, I
want to know your and other's opinion on the following two points:
a. Both this and the patch discussed in the nearby thread [1] add an
additional parameter to
WalSndUpdateProgress/OutputPluginUpdateProgress and it seems to me
that both are required. The additional parameter 'last_write' added by
this patch indicates: "If the last write is skipped then try (if we
are close to wal_sender_timeout) to send a keepalive message to the
receiver to avoid timeouts.". This means it can be used after any
'write' message. OTOH, the parameter 'skipped_xact' added by another
patch [1] indicates if we have skipped sending anything for a
transaction then sendkeepalive for synchronous replication to avoid
any delays in such a transaction. Does this sound reasonable or can
you think of a better way to deal with it?
b. Do we want to backpatch the patch in this thread? I am reluctant to
backpatch because it changes the exposed API which can have an impact
and second there exists a workaround (user can increase
wal_sender_timeout/wal_receiver_timeout).

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB5716BB24409D4B69206615B1941A9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-03-25 Thread wangw.f...@fujitsu.com

On Fri, Mar 25, 2022 at 2:19 PM Masahiko Sawada  wrote:
> On Fri, Mar 25, 2022 at 2:23 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Thur, Mar 24, 2022 at 6:32 PM Amit Kapila 
> wrote:
> > >
> > Thanks for your kindly update.
> >
> > > It seems by mistake you have removed the changes from
> pgoutput_message
> > > and pgoutput_truncate functions. I have added those back.
> > > Additionally, I made a few other changes: (a) moved the function
> > > UpdateProgress to pgoutput.c as it is not used outside it, (b) change
> > > the new parameter in plugin API from 'send_keep_alive' to 'last_write'
> > > to make it look similar to WalSndPrepareWrite and WalSndWriteData, (c)
> > > made a number of changes in WalSndUpdateProgress API, it is better to
> > > move keep-alive code after lag track code because we do process
> > > replies at that time and there it will compute the lag; (d)
> > > changed/added comments in the code.
> > >
> > > Do let me know what you think of the attached?
> > It looks good to me. Just rebase it because the change in header(75b1521).
> > I tested it and the result looks good to me.
> 
> Since commit 75b1521 added decoding of sequence to logical
> replication, the patch needs to have pgoutput_sequence() call
> update_progress().
Thanks for your comments.

Yes, you are right.
Add missing handling of pgoutput_sequence.

Attach the new patch.

Regards,
Wang wei


v7-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  v7-0001-Fix-the-logical-replication-timeout-during-large-.patch

RE: Logical replication timeout problem

2022-03-27 Thread kuroda.hay...@fujitsu.com

Dear Wang-san,

Thank you for updating!
...but it also cannot be applied to current HEAD
because of the commit 923def9a533.

Your patch seems to conflict the adding an argument of 
logicalrep_write_insert().
It allows specifying columns to publish by skipping some columns in 
logicalrep_write_tuple()
which is called from logicalrep_write_insert() and logicalrep_write_update().

Do we have to consider something special case for that?
I thought timeout may occur if users have huge table and publish few columns,
but it is corner case.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: Logical replication timeout problem

2022-03-27 Thread wangw.f...@fujitsu.com

On Mon, Mar 28, 2022 at 9:56 AM Kuroda, Hayato/黒田 隼人 
 wrote:
> Dear Wang-san,
Thanks for your comments.

> Thank you for updating!
> ...but it also cannot be applied to current HEAD
> because of the commit 923def9a533.
> 
> Your patch seems to conflict the adding an argument of
> logicalrep_write_insert().
> It allows specifying columns to publish by skipping some columns in
> logicalrep_write_tuple()
> which is called from logicalrep_write_insert() and logicalrep_write_update().
Thank for your kindly reminder.
Rebase the patch.

> Do we have to consider something special case for that?
> I thought timeout may occur if users have huge table and publish few columns,
> but it is corner case.
I think maybe we do not need to deal with this use case.
The maximum number of table columns allowed by PG is 1600
(macro MaxHeapAttributeNumber), and after loop through all columns in the
function logicalrep_write_tuple, the function OutputPluginWrite will be invoked
immediately to actually send the data to the subscriber. This refreshes the
last time the subscriber received a message.
So I think this loop will not cause timeout issues.

Regards,
Wang wei


v8-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  v8-0001-Fix-the-logical-replication-timeout-during-large-.patch

Re: Logical replication timeout problem

2022-03-27 Thread Amit Kapila

On Mon, Mar 28, 2022 at 11:41 AM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Mar 28, 2022 at 9:56 AM Kuroda, Hayato/黒田 隼人 
>  wrote:
>
> > Do we have to consider something special case for that?
> > I thought timeout may occur if users have huge table and publish few 
> > columns,
> > but it is corner case.
> I think maybe we do not need to deal with this use case.
> The maximum number of table columns allowed by PG is 1600
> (macro MaxHeapAttributeNumber), and after loop through all columns in the
> function logicalrep_write_tuple, the function OutputPluginWrite will be 
> invoked
> immediately to actually send the data to the subscriber. This refreshes the
> last time the subscriber received a message.
> So I think this loop will not cause timeout issues.
>

Right, I also don't think it can be a source of timeout.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-03-28 Thread kuroda.hay...@fujitsu.com

Dear Amit, Wang,

> > I think maybe we do not need to deal with this use case.
> > The maximum number of table columns allowed by PG is 1600
> > (macro MaxHeapAttributeNumber), and after loop through all columns in the
> > function logicalrep_write_tuple, the function OutputPluginWrite will be 
> > invoked
> > immediately to actually send the data to the subscriber. This refreshes the
> > last time the subscriber received a message.
> > So I think this loop will not cause timeout issues.
> >
> 
> Right, I also don't think it can be a source of timeout.

OK. I have no comments for this version.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: Logical replication timeout problem

2022-03-28 Thread wangw.f...@fujitsu.com

On Mon, Mar 28, 2022 at 2:11 AM I wrote:
> Rebase the patch.

After reviewing anohter patch[1], I think this patch should also add a loop in
function WalSndUpdateProgress like what did in function WalSndWriteData.
So update the patch to be consistent with the existing code and the patch
mentioned above.

Attach the new patch.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB5716946347F607F4CFB02FCE941D9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Regards,
Wang wei


v9-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  v9-0001-Fix-the-logical-replication-timeout-during-large-.patch

Re: Logical replication timeout problem

2022-03-28 Thread Masahiko Sawada

On Fri, Mar 25, 2022 at 5:33 PM Amit Kapila  wrote:
>
> On Fri, Mar 25, 2022 at 11:49 AM Masahiko Sawada  
> wrote:
> >
> > On Fri, Mar 25, 2022 at 2:23 PM wangw.f...@fujitsu.com
> >  wrote:
> >
> > Since commit 75b1521 added decoding of sequence to logical
> > replication, the patch needs to have pgoutput_sequence() call
> > update_progress().
> >
>
> Yeah, I also think this needs to be addressed. But apart from this, I
> want to know your and other's opinion on the following two points:
> a. Both this and the patch discussed in the nearby thread [1] add an
> additional parameter to
> WalSndUpdateProgress/OutputPluginUpdateProgress and it seems to me
> that both are required. The additional parameter 'last_write' added by
> this patch indicates: "If the last write is skipped then try (if we
> are close to wal_sender_timeout) to send a keepalive message to the
> receiver to avoid timeouts.". This means it can be used after any
> 'write' message. OTOH, the parameter 'skipped_xact' added by another
> patch [1] indicates if we have skipped sending anything for a
> transaction then sendkeepalive for synchronous replication to avoid
> any delays in such a transaction. Does this sound reasonable or can
> you think of a better way to deal with it?

These current approaches look good to me.

> b. Do we want to backpatch the patch in this thread? I am reluctant to
> backpatch because it changes the exposed API which can have an impact
> and second there exists a workaround (user can increase
> wal_sender_timeout/wal_receiver_timeout).

Yeah, we should avoid API changes between minor versions. I feel it's
better to fix it also for back-branches but probably we need another
fix for them. The issue reported on this thread seems quite
confusable; it looks like a network problem but is not true. Also, the
user who faced this issue has to increase wal_sender_timeout due to
the decoded data size, which also means to delay detecting network
problems. It seems an unrelated trade-off.

Regards,
-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Logical replication timeout problem

2022-03-30 Thread wangw.f...@fujitsu.com

On Tues, Mar 29, 2022 at 9:45 AM I wrote:
> Attach the new patch.

Rebase the patch because the commit d5a9d86d in current HEAD.

Regards,
Wang wei


v10-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v10-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-03-30 Thread Amit Kapila

On Wed, Mar 30, 2022 at 1:24 PM wangw.f...@fujitsu.com
 wrote:
>
> On Tues, Mar 29, 2022 at 9:45 AM I wrote:
> > Attach the new patch.
>
> Rebase the patch because the commit d5a9d86d in current HEAD.
>

Thanks, this looks good to me apart from a minor indentation change
which I'll take care of before committing. I am planning to push this
day after tomorrow on Friday unless there are any other major
comments.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-03-30 Thread shiy.f...@fujitsu.com

On Wed, Mar 30, 2022 3:54 PM wangw.f...@fujitsu.com  
wrote:
> 
> Rebase the patch because the commit d5a9d86d in current HEAD.
> 

Thanks for your patch, I tried this patch and confirmed that there is no timeout
problem after applying this patch, and I could reproduce this problem on HEAD.

Regards,
Shi yu

Re: Logical replication timeout problem

2022-03-31 Thread Masahiko Sawada

On Wed, Mar 30, 2022 at 6:00 PM Amit Kapila  wrote:
>
> On Wed, Mar 30, 2022 at 1:24 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Tues, Mar 29, 2022 at 9:45 AM I wrote:
> > > Attach the new patch.
> >
> > Rebase the patch because the commit d5a9d86d in current HEAD.
> >
>
> Thanks, this looks good to me apart from a minor indentation change
> which I'll take care of before committing. I am planning to push this
> day after tomorrow on Friday unless there are any other major
> comments.

The patch basically looks good to me. But the only concern to me is
that once we get the patch committed, we will have to call
update_progress() at all paths in callbacks that process changes.
Which seems poor maintainability.

On the other hand, possible another solution would be to add a new
callback that is called e.g., every 1000 changes so that walsender
does its job such as timeout handling while processing the decoded
data in reorderbuffer.c. The callback is set only if the walsender
does logical decoding, otherwise NULL. With this idea, other plugins
will also be able to benefit without changes. But I’m not really sure
it’s a good design, and adding a new callback introduces complexity.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-03-31 Thread Amit Kapila

On Thu, Mar 31, 2022 at 5:55 PM Masahiko Sawada  wrote:
> On Wed, Mar 30, 2022 at 6:00 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 30, 2022 at 1:24 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tues, Mar 29, 2022 at 9:45 AM I wrote:
> > > > Attach the new patch.
> > >
> > > Rebase the patch because the commit d5a9d86d in current HEAD.
> > >
> >
> > Thanks, this looks good to me apart from a minor indentation change
> > which I'll take care of before committing. I am planning to push this
> > day after tomorrow on Friday unless there are any other major
> > comments.
>
> The patch basically looks good to me. But the only concern to me is
> that once we get the patch committed, we will have to call
> update_progress() at all paths in callbacks that process changes.
> Which seems poor maintainability.
>
> On the other hand, possible another solution would be to add a new
> callback that is called e.g., every 1000 changes so that walsender
> does its job such as timeout handling while processing the decoded
> data in reorderbuffer.c. The callback is set only if the walsender
> does logical decoding, otherwise NULL. With this idea, other plugins
> will also be able to benefit without changes. But I’m not really sure
> it’s a good design, and adding a new callback introduces complexity.
>

Yeah, same here. I have also mentioned another way to expose an API
from reorderbuffer [1] by introducing a skip API but just not sure if
that or this API is generic enough to make it adding worth. Also, note
that the current patch makes the progress recording of large
transactions somewhat better when most of the changes are skipped. We
can further extend it to make it true for other cases as well but that
probably can be done separately if required as that is not required
for this bug-fix.

I intend to commit this patch today but I think it is better to wait
for a few more days to see if anybody has any opinion on this matter.
I'll push this on Tuesday unless we decide to do something different
here.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BfQjndoBOFUn9Wy0hhm3MLyUWEpcT9O7iuCELktfdBiQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-03-31 Thread Euler Taveira

On Thu, Mar 31, 2022, at 9:24 AM, Masahiko Sawada wrote:
> The patch basically looks good to me. But the only concern to me is
> that once we get the patch committed, we will have to call
> update_progress() at all paths in callbacks that process changes.
> Which seems poor maintainability.
I didn't like the current fix for the same reason. We need a robust feedback
system for logical replication. We had this discussion in the "skip empty
transactions" thread [1].

> On the other hand, possible another solution would be to add a new
> callback that is called e.g., every 1000 changes so that walsender
> does its job such as timeout handling while processing the decoded
> data in reorderbuffer.c. The callback is set only if the walsender
> does logical decoding, otherwise NULL. With this idea, other plugins
> will also be able to benefit without changes. But I’m not really sure
> it’s a good design, and adding a new callback introduces complexity.
No new callback is required.

In the current code, each output plugin callback is responsible to call
OutputPluginUpdateProgress. It is up to the output plugin author to add calls
to this function. The lack of a call in a callback might cause issues like what
was described in the initial message.

The functions CreateInitDecodingContext and CreateDecodingContext receives the
update_progress function as a parameter. These functions are called in 2
places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
WalSndUpdateProgress as a progress function. Case (b) does not have one because
it is not required -- local decoding/communication. There is no custom update
progress routine for each output plugin which leads me to the question:
couldn't we encapsulate the update progress call into the callback functions?
If so, we could have an output plugin parameter to inform which callbacks we
would like to call the update progress routine. This would simplify the code,
make it less error prone and wouldn't impose a burden on maintainability.

[1] 
https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-03-31 Thread Amit Kapila

On Fri, Apr 1, 2022 at 7:33 AM Euler Taveira  wrote:
>
> On Thu, Mar 31, 2022, at 9:24 AM, Masahiko Sawada wrote:
>
> On the other hand, possible another solution would be to add a new
> callback that is called e.g., every 1000 changes so that walsender
> does its job such as timeout handling while processing the decoded
> data in reorderbuffer.c. The callback is set only if the walsender
> does logical decoding, otherwise NULL. With this idea, other plugins
> will also be able to benefit without changes. But I’m not really sure
> it’s a good design, and adding a new callback introduces complexity.
>
> No new callback is required.
>
> In the current code, each output plugin callback is responsible to call
> OutputPluginUpdateProgress. It is up to the output plugin author to add calls
> to this function. The lack of a call in a callback might cause issues like 
> what
> was described in the initial message.
>

This is exactly our initial analysis and we have tried a patch on
these lines and it has a noticeable overhead. See [1]. Calling this
for each change or each skipped change can bring noticeable overhead
that is why we decided to call it after a certain threshold (100) of
skipped changes. Now, surely as mentioned in my previous reply we can
make it generic such that instead of calling this (update_progress
function as in the patch) for skipped cases, we call it always. Will
that make it better?

> The functions CreateInitDecodingContext and CreateDecodingContext receives the
> update_progress function as a parameter. These functions are called in 2
> places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
> SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> WalSndUpdateProgress as a progress function. Case (b) does not have one 
> because
> it is not required -- local decoding/communication. There is no custom update
> progress routine for each output plugin which leads me to the question:
> couldn't we encapsulate the update progress call into the callback functions?
>

Sorry, I don't get your point. What exactly do you mean by this?
AFAIS, currently we call this output plugin API in pgoutput functions
only, do you intend to get it invoked from a different place?

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275DFFDAC7A59FA148931529E209%40OS3PR01MB6275.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-03-31 Thread Euler Taveira

On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
> This is exactly our initial analysis and we have tried a patch on
> these lines and it has a noticeable overhead. See [1]. Calling this
> for each change or each skipped change can bring noticeable overhead
> that is why we decided to call it after a certain threshold (100) of
> skipped changes. Now, surely as mentioned in my previous reply we can
> make it generic such that instead of calling this (update_progress
> function as in the patch) for skipped cases, we call it always. Will
> that make it better?
That's what I have in mind but using a different approach.

> > The functions CreateInitDecodingContext and CreateDecodingContext receives 
> > the
> > update_progress function as a parameter. These functions are called in 2
> > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
> > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > WalSndUpdateProgress as a progress function. Case (b) does not have one 
> > because
> > it is not required -- local decoding/communication. There is no custom 
> > update
> > progress routine for each output plugin which leads me to the question:
> > couldn't we encapsulate the update progress call into the callback 
> > functions?
> >
> 
> Sorry, I don't get your point. What exactly do you mean by this?
> AFAIS, currently we call this output plugin API in pgoutput functions
> only, do you intend to get it invoked from a different place?
It seems I didn't make myself clear. The callbacks I'm referring to the
*_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
*_cb_wrapper() function, we have something like:

if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
NewUpdateProgress(ctx, false);

The NewUpdateProgress function would contain a logic similar to the
update_progress() from the proposed patch. (A different function name here just
to avoid confusion.)

The output plugin is responsible to set ctx->progress with the callback
variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) that we would
like to run NewUpdateProgress.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-03-31 Thread Amit Kapila

On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
>
> On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
>
> This is exactly our initial analysis and we have tried a patch on
> these lines and it has a noticeable overhead. See [1]. Calling this
> for each change or each skipped change can bring noticeable overhead
> that is why we decided to call it after a certain threshold (100) of
> skipped changes. Now, surely as mentioned in my previous reply we can
> make it generic such that instead of calling this (update_progress
> function as in the patch) for skipped cases, we call it always. Will
> that make it better?
>
> That's what I have in mind but using a different approach.
>
> > The functions CreateInitDecodingContext and CreateDecodingContext receives 
> > the
> > update_progress function as a parameter. These functions are called in 2
> > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
> > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > WalSndUpdateProgress as a progress function. Case (b) does not have one 
> > because
> > it is not required -- local decoding/communication. There is no custom 
> > update
> > progress routine for each output plugin which leads me to the question:
> > couldn't we encapsulate the update progress call into the callback 
> > functions?
> >
>
> Sorry, I don't get your point. What exactly do you mean by this?
> AFAIS, currently we call this output plugin API in pgoutput functions
> only, do you intend to get it invoked from a different place?
>
> It seems I didn't make myself clear. The callbacks I'm referring to the
> *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> *_cb_wrapper() function, we have something like:
>
> if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> NewUpdateProgress(ctx, false);
>
> The NewUpdateProgress function would contain a logic similar to the
> update_progress() from the proposed patch. (A different function name here 
> just
> to avoid confusion.)
>
> The output plugin is responsible to set ctx->progress with the callback
> variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) that we 
> would
> like to run NewUpdateProgress.
>

This sounds like a conflicting approach to what we currently do.
Currently, OutputPluginUpdateProgress() is called from the xact
related pgoutput functions like pgoutput_commit_txn(),
pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we
follow what you are saying then for some of the APIs like
pgoutput_change/_message/_truncate, we need to set the parameter to
invoke NewUpdateProgress() which will internally call
OutputPluginUpdateProgress(), and for the remaining APIs, we will call
in the corresponding pgoutput_* function. I feel if we want to make it
more generic than the current patch, it is better to directly call
what you are referring to here as NewUpdateProgress() in all remaining
APIs like pgoutput_change/_truncate, etc.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-04-05 Thread wangw.f...@fujitsu.com

On Fri, Apr 1, 2022 at 12:09 AM Amit Kapila  wrote:
> On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
> >
> > On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
> >
> > This is exactly our initial analysis and we have tried a patch on
> > these lines and it has a noticeable overhead. See [1]. Calling this
> > for each change or each skipped change can bring noticeable overhead
> > that is why we decided to call it after a certain threshold (100) of
> > skipped changes. Now, surely as mentioned in my previous reply we can
> > make it generic such that instead of calling this (update_progress
> > function as in the patch) for skipped cases, we call it always. Will
> > that make it better?
> >
> > That's what I have in mind but using a different approach.
> >
> > > The functions CreateInitDecodingContext and CreateDecodingContext
> receives the
> > > update_progress function as a parameter. These functions are called in 2
> > > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and
> (b)
> > > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > > WalSndUpdateProgress as a progress function. Case (b) does not have one
> because
> > > it is not required -- local decoding/communication. There is no custom
> update
> > > progress routine for each output plugin which leads me to the question:
> > > couldn't we encapsulate the update progress call into the callback 
> > > functions?
> > >
> >
> > Sorry, I don't get your point. What exactly do you mean by this?
> > AFAIS, currently we call this output plugin API in pgoutput functions
> > only, do you intend to get it invoked from a different place?
> >
> > It seems I didn't make myself clear. The callbacks I'm referring to the
> > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> > *_cb_wrapper() function, we have something like:
> >
> > if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> > NewUpdateProgress(ctx, false);
> >
> > The NewUpdateProgress function would contain a logic similar to the
> > update_progress() from the proposed patch. (A different function name here
> just
> > to avoid confusion.)
> >
> > The output plugin is responsible to set ctx->progress with the callback
> > variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb())
> that we would
> > like to run NewUpdateProgress.
> >
> 
> This sounds like a conflicting approach to what we currently do.
> Currently, OutputPluginUpdateProgress() is called from the xact
> related pgoutput functions like pgoutput_commit_txn(),
> pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we
> follow what you are saying then for some of the APIs like
> pgoutput_change/_message/_truncate, we need to set the parameter to
> invoke NewUpdateProgress() which will internally call
> OutputPluginUpdateProgress(), and for the remaining APIs, we will call
> in the corresponding pgoutput_* function. I feel if we want to make it
> more generic than the current patch, it is better to directly call
> what you are referring to here as NewUpdateProgress() in all remaining
> APIs like pgoutput_change/_truncate, etc.
Thanks for your comments.

According to your suggestion, improve the patch to make it more generic.
Attach the new patch.

Regards,
Wang wei


v11-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v11-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-04-05 Thread Amit Kapila

On Wed, Apr 6, 2022 at 11:09 AM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Apr 1, 2022 at 12:09 AM Amit Kapila  wrote:
> > On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
> > >
> > > It seems I didn't make myself clear. The callbacks I'm referring to the
> > > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> > > *_cb_wrapper() function, we have something like:
> > >
> > > if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> > > NewUpdateProgress(ctx, false);
> > >
> > > The NewUpdateProgress function would contain a logic similar to the
> > > update_progress() from the proposed patch. (A different function name here
> > just
> > > to avoid confusion.)
> > >
> > > The output plugin is responsible to set ctx->progress with the callback
> > > variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb())
> > that we would
> > > like to run NewUpdateProgress.
> > >
> >
> > This sounds like a conflicting approach to what we currently do.
> > Currently, OutputPluginUpdateProgress() is called from the xact
> > related pgoutput functions like pgoutput_commit_txn(),
> > pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we
> > follow what you are saying then for some of the APIs like
> > pgoutput_change/_message/_truncate, we need to set the parameter to
> > invoke NewUpdateProgress() which will internally call
> > OutputPluginUpdateProgress(), and for the remaining APIs, we will call
> > in the corresponding pgoutput_* function. I feel if we want to make it
> > more generic than the current patch, it is better to directly call
> > what you are referring to here as NewUpdateProgress() in all remaining
> > APIs like pgoutput_change/_truncate, etc.
> Thanks for your comments.
>
> According to your suggestion, improve the patch to make it more generic.
> Attach the new patch.
>

 typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
LogicalDecodingContext *lr,
  XLogRecPtr Ptr,
  TransactionId xid,
- bool skipped_xact
+ bool skipped_xact,
+ bool last_write

In this approach, I don't think we need an additional parameter
last_write. Let's do the work related to keepalive without a
parameter, do you see any problem with that?

Also, let's try to evaluate how it impacts lag functionality for large
transactions?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-06 Thread Amit Kapila

On Wed, Apr 6, 2022 at 11:28 AM Amit Kapila  wrote:
>
> On Wed, Apr 6, 2022 at 11:09 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > According to your suggestion, improve the patch to make it more generic.
> > Attach the new patch.
> >
>
>  typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
> LogicalDecodingContext *lr,
>   XLogRecPtr Ptr,
>   TransactionId xid,
> - bool skipped_xact
> + bool skipped_xact,
> + bool last_write
>
> In this approach, I don't think we need an additional parameter
> last_write. Let's do the work related to keepalive without a
> parameter, do you see any problem with that?
>

I think this patch doesn't take into account that we call
OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I
think we should always call the new function update_progress from
those existing call sites and arrange the function such that when
called from xact end APIs like pgoutput_commit_txn(), it always call
OutputPluginUpdateProgress and make changes_count as 0.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-04-06 Thread wangw.f...@fujitsu.com

On Wed, Apr 6, 2022 at 1:59 AM Amit Kapila  wrote:
On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila  wrote:
>
Thanks for your comments.

>  typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
> LogicalDecodingContext *lr,
>   XLogRecPtr Ptr,
>   TransactionId xid,
> - bool skipped_xact
> + bool skipped_xact,
> + bool last_write
> 
> In this approach, I don't think we need an additional parameter last_write. 
> Let's
> do the work related to keepalive without a parameter, do you see any problem
> with that?
I agree with you. Modify this point.

> I think this patch doesn't take into account that we call
> OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I
> think we should always call the new function update_progress from
> those existing call sites and arrange the function such that when
> called from xact end APIs like pgoutput_commit_txn(), it always call
> OutputPluginUpdateProgress and make changes_count as 0.
Improve it.
Add two new input to function update_progress.(skipped_xact and end_xact).
Modify the function invoke from OutputPluginUpdateProgress to update_progress.

> Also, let's try to evaluate how it impacts lag functionality for large 
> transactions?
I think this patch will not affect lag functionality. It will updates the lag
field of view pg_stat_replication more frequently.
IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
change and invoking time in lag_tracker. Then when invoking function
ProcessStandbyReplyMessage, it will calculate the lag field according to the
message from subscriber and the information in lag_tracker. This patch does
not modify this logic, but only increases the frequency of invoking.
Please let me know if I understand wrong.

Attach the new patch.
1. Remove the new function input parameters in this patch(parameter last_write
of WalSndUpdateProgress). [suggestion by Amit-San]
2. Also invoke function update_progress in other xact end APIs like
pgoutput_commit_txn. [suggestion by Amit-San]

Regards,
Wang wei


v12-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v12-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-04-06 Thread Amit Kapila

On Wed, Apr 6, 2022 at 6:30 PM wangw.f...@fujitsu.com
 wrote:
>
> On Wed, Apr 6, 2022 at 1:59 AM Amit Kapila  wrote:
> On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila  wrote:
> >
> Thanks for your comments.
>
> >  typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
> > LogicalDecodingContext *lr,
> >   XLogRecPtr Ptr,
> >   TransactionId xid,
> > - bool skipped_xact
> > + bool skipped_xact,
> > + bool last_write
> >
> > In this approach, I don't think we need an additional parameter last_write. 
> > Let's
> > do the work related to keepalive without a parameter, do you see any problem
> > with that?
> I agree with you. Modify this point.
>
> > I think this patch doesn't take into account that we call
> > OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I
> > think we should always call the new function update_progress from
> > those existing call sites and arrange the function such that when
> > called from xact end APIs like pgoutput_commit_txn(), it always call
> > OutputPluginUpdateProgress and make changes_count as 0.
> Improve it.
> Add two new input to function update_progress.(skipped_xact and end_xact).
> Modify the function invoke from OutputPluginUpdateProgress to update_progress.
>
> > Also, let's try to evaluate how it impacts lag functionality for large 
> > transactions?
> I think this patch will not affect lag functionality. It will updates the lag
> field of view pg_stat_replication more frequently.
> IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
> change and invoking time in lag_tracker. Then when invoking function
> ProcessStandbyReplyMessage, it will calculate the lag field according to the
> message from subscriber and the information in lag_tracker. This patch does
> not modify this logic, but only increases the frequency of invoking.
> Please let me know if I understand wrong.
>

No, your understanding seems correct to me. But what I want to check
is if calling the progress function more often has any impact on
lag-related fields in pg_stat_replication? I think you need to check
the impact of large transaction replay.

One comment:
+static void
+update_progress(LogicalDecodingContext *ctx, bool skipped_xact, bool end_xact)
+{
+ static int changes_count = 0;
+
+ if (end_xact)
+ {
+ /* Update progress tracking at xact end. */
+ OutputPluginUpdateProgress(ctx, skipped_xact);
+ changes_count = 0;
+ }
+ /*
+ * After continuously processing CHANGES_THRESHOLD changes, update progress
+ * which will also try to send a keepalive message if required.

I think you can simply return after making changes_count = 0. There
should be an empty line before starting the next comment.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-04-07 Thread wangw.f...@fujitsu.com

On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila   wrote:
>
Thanks for your comments.

> One comment:
> +static void
> +update_progress(LogicalDecodingContext *ctx, bool skipped_xact, bool
> end_xact)
> +{
> + static int changes_count = 0;
> +
> + if (end_xact)
> + {
> + /* Update progress tracking at xact end. */
> + OutputPluginUpdateProgress(ctx, skipped_xact);
> + changes_count = 0;
> + }
> + /*
> + * After continuously processing CHANGES_THRESHOLD changes, update
> progress
> + * which will also try to send a keepalive message if required.
> 
> I think you can simply return after making changes_count = 0. There
> should be an empty line before starting the next comment.
Improve as suggested.
BTW, there is a conflict in current HEAD when applying v12 because of the
commit 2c7ea57. Also rebase it.

Attach the new patch.
1. Make some improvements to the new function update_progress. [suggestion by 
Amit-San]
2. Rebase the patch because the commit 2c7ea57 in current HEAD.

Regards,
Wang wei


v13-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v13-0001-Fix-the-logical-replication-timeout-during-large.patch

RE: Logical replication timeout problem

2022-04-10 Thread wangw.f...@fujitsu.com

On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila   wrote:
> On Wed, Apr 6, 2022 at 6:30 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Wed, Apr 6, 2022 at 1:58 AM Amit Kapila  wrote:
> > On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila  wrote:
> > > Also, let's try to evaluate how it impacts lag functionality for large
> transactions?
> > I think this patch will not affect lag functionality. It will updates the 
> > lag
> > field of view pg_stat_replication more frequently.
> > IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
> > change and invoking time in lag_tracker. Then when invoking function
> > ProcessStandbyReplyMessage, it will calculate the lag field according to the
> > message from subscriber and the information in lag_tracker. This patch does
> > not modify this logic, but only increases the frequency of invoking.
> > Please let me know if I understand wrong.
> >
> 
> No, your understanding seems correct to me. But what I want to check
> is if calling the progress function more often has any impact on
> lag-related fields in pg_stat_replication? I think you need to check
> the impact of large transaction replay.
Thanks for the explanation.

After doing some checks, I found that the v13 patch makes the calculations of
lag functionality inaccurate.

In short, v13 patch lets us try to track lag more frequently and try to send a
keepalive message to subscribers. But in order to prevent flooding the lag
tracker, we could not track lag more than once within
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
This means we may lose informations that needs to be tracked.
For example, suppose there is a large transaction with lsn from lsn1 to lsn3.
In HEAD, when we calculate the lag time for lsn3, the lag time of lsn3 is
(now - lsn3.time).
But with v13 patch, when we calculate the lag time for lsn3, because there
maybe no informations of lsn3 but has informations of lsn2 in lag_tracker, the
lag time of lsn3 is (now - t2.time). (see function LagTrackerRead)
Therefore, if we lose the informations that need to be tracked, the lag time
becomes large and inaccurate.

So I skip tracking lag during a transaction just like the current HEAD.
Attach the new patch.

Regards,
Wang wei

v14-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v14-0001-Fix-the-logical-replication-timeout-during-large.patch

RE: Logical replication timeout problem

2022-04-11 Thread wangw.f...@fujitsu.com

On Mon, Apr 11, 2022 at 2:39 PM I wrote:
> Attach the new patch.
Also, share test results and details.

To check that the lsn information used for the calculation is what we expected,
I get some information by adding logs in the function LagTrackerRead.

Summary of test results:
- In current HEAD and current HEAD with v14 patch, we could found the
  information of same lsn as received from subscriber-side in lag_tracker.
- In current HEAD with v13 patch, we could hardly found the information of same
  lsn in lag_tracker.

Attach the details:
[The log by HEAD]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  382826584
743884840|  743884840
1104943232   |  1104943232
1468949424   |  1468949424
1469521216   |  1469521216

[The log by HEAD with v14 patch]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  382826584
743890672|  743890672
1105074264   |  1105074264
1469127040   |  1469127040
1830591240   |  1830591240

[The log by HEAD with v13 patch]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  359848728 
743884840|  713808560 
1105010640   |  1073978544
1468517536   |  1447850160
1469516328   |  1469516328

Regards,
Wang wei

Re: Logical replication timeout problem

2022-04-13 Thread Amit Kapila

On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
 wrote:
>
> So I skip tracking lag during a transaction just like the current HEAD.
> Attach the new patch.
>

Thanks, please find the updated patch where I have slightly modified
the comments.

Sawada-San, Euler, do you have any opinion on this approach? I
personally still prefer the approach implemented in v10 [1] especially
due to the latest finding by Wang-San that we can't update the
lag-tracker apart from when it is invoked at the transaction end.
However, I am fine if we like this approach more.

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275E0C2B4D9E488AD7CBA209E1F9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
-- 
With Regards,
Amit Kapila.

v15-0001-Fix-the-logical-replication-timeout-during-large.patch
Description: Binary data

Re: Logical replication timeout problem

2022-04-14 Thread Masahiko Sawada

On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila  wrote:
>
> On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > So I skip tracking lag during a transaction just like the current HEAD.
> > Attach the new patch.
> >
>
> Thanks, please find the updated patch where I have slightly modified
> the comments.
>
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.

Thank you for updating the patch.

The current patch looks much better than v10 which requires to call to
update_progress() every path.

Regarding v15 patch, I'm concerned a bit that the new function name,
update_progress(), is too generic. How about
update_replation_progress() or something more specific name?

---
+if (end_xact)
+{
+/* Update progress tracking at xact end. */
+OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
+changes_count = 0;
+return;
+}
+
+/*
+ * After continuously processing CHANGES_THRESHOLD changes,
we try to send
+ * a keepalive message if required.
+ *
+ * We don't want to try sending a keepalive message after
processing each
+ * change as that can have overhead. Testing reveals that there is no
+ * noticeable overhead in doing it after continuously
processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100
+if (++changes_count >= CHANGES_THRESHOLD)
+{
+OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
+changes_count = 0;
+}

Can we merge two if branches since we do the same things? Or did you
separate them for better readability?

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-14 Thread Euler Taveira

On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > So I skip tracking lag during a transaction just like the current HEAD.
> > Attach the new patch.
> >
> 
> Thanks, please find the updated patch where I have slightly modified
> the comments.
> 
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.
It seems v15 is simpler and less error prone than v10. v10 has a mix of
OutputPluginUpdateProgress() and the new function update_progress(). The v10
also calls update_progress() for every change action in pgoutput_change(). It
is not a good approach for maintainability -- new changes like sequences need
extra calls. However, as you mentioned there should handle the track lag case.

Both patches change the OutputPluginUpdateProgress() so it cannot be
backpatched. Are you planning to backpatch it? If so, the boolean variable
(last_write or end_xacts depending of which version you are considering) could
be added to LogicalDecodingContext. (You should probably consider this approach
for skipped_xact too)

+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time then it can timeout. This can happen when all or most of the
+ * changes are either not published or got filtered out.

We should probable mention that "long time" is wal_receiver_timeout on
subscriber.

+* change as that can have overhead. Testing reveals that there is no
+* noticeable overhead in doing it after continuously processing 100 or so
+* changes.

Tests revealed that ...

+* We don't have a mechanism to get the ack for any LSN other than end xact
+* lsn from the downstream. So, we track lag only for end xact lsn's.

s/lsn/LSN/ and s/lsn's/LSNs/

I would say "end of transaction LSN".

+ * If too many changes are processed then try to send a keepalive message to
+ * receiver to avoid timeouts.

In logical replication, if too many changes are processed then try to send a
keepalive message. It might avoid a timeout in the subscriber.

Does this same issue occur for long transactions? I mean keep a long
transaction open and execute thousands of transactions.

BEGIN;
INSERT INTO foo (a) VALUES(1);
-- wait a few hours while executing 10^x transactions
INSERT INTO foo (a) VALUES(2);
COMMIT;


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-17 Thread Amit Kapila

On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira  wrote:
>
> On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
>
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.
>
> It seems v15 is simpler and less error prone than v10. v10 has a mix of
> OutputPluginUpdateProgress() and the new function update_progress(). The v10
> also calls update_progress() for every change action in pgoutput_change(). It
> is not a good approach for maintainability -- new changes like sequences need
> extra calls.
>

Okay, let's use the v15 approach as Sawada-San also seems to have a
preference for that.

> However, as you mentioned there should handle the track lag case.
>
> Both patches change the OutputPluginUpdateProgress() so it cannot be
> backpatched. Are you planning to backpatch it? If so, the boolean variable
> (last_write or end_xacts depending of which version you are considering) could
> be added to LogicalDecodingContext.
>

If we add it to LogicalDecodingContext then I think we have to always
reset the variable after its use which will make it look ugly and
error-prone. I was not thinking to backpatch it because of the API
change but I guess if we want to backpatch then we can add it to
LogicalDecodingContext for back-branches. I am not sure if that will
look committable but surely we can try.

> (You should probably consider this approach
> for skipped_xact too)
>

As mentioned, I think it will be more error-prone and we already have
other xact related parameters in that and similar APIs. So, I am not
sure why you want to prefer that?

>
> Does this same issue occur for long transactions? I mean keep a long
> transaction open and execute thousands of transactions.
>

No, this problem won't happen for such cases because we will only try
to send it at the commit time. Note that this problem happens only
when we don't send anything to the subscriber till a timeout happens.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-17 Thread Amit Kapila

On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > So I skip tracking lag during a transaction just like the current HEAD.
> > > Attach the new patch.
> > >
> >
> > Thanks, please find the updated patch where I have slightly modified
> > the comments.
> >
> > Sawada-San, Euler, do you have any opinion on this approach? I
> > personally still prefer the approach implemented in v10 [1] especially
> > due to the latest finding by Wang-San that we can't update the
> > lag-tracker apart from when it is invoked at the transaction end.
> > However, I am fine if we like this approach more.
>
> Thank you for updating the patch.
>
> The current patch looks much better than v10 which requires to call to
> update_progress() every path.
>
> Regarding v15 patch, I'm concerned a bit that the new function name,
> update_progress(), is too generic. How about
> update_replation_progress() or something more specific name?
>

Do you intend to say update_replication_progress()? The word
'replation' doesn't make sense to me. I am fine with this suggestion.

>
> ---
> +if (end_xact)
> +{
> +/* Update progress tracking at xact end. */
> +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> +changes_count = 0;
> +return;
> +}
> +
> +/*
> + * After continuously processing CHANGES_THRESHOLD changes,
> we try to send
> + * a keepalive message if required.
> + *
> + * We don't want to try sending a keepalive message after
> processing each
> + * change as that can have overhead. Testing reveals that there is no
> + * noticeable overhead in doing it after continuously
> processing 100 or so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +if (++changes_count >= CHANGES_THRESHOLD)
> +{
> +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> +changes_count = 0;
> +}
>
> Can we merge two if branches since we do the same things? Or did you
> separate them for better readability?
>

I think it is fine to merge the two checks.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-17 Thread Masahiko Sawada

On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila  wrote:
>
> On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila  wrote:
> > >
> > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > So I skip tracking lag during a transaction just like the current HEAD.
> > > > Attach the new patch.
> > > >
> > >
> > > Thanks, please find the updated patch where I have slightly modified
> > > the comments.
> > >
> > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > personally still prefer the approach implemented in v10 [1] especially
> > > due to the latest finding by Wang-San that we can't update the
> > > lag-tracker apart from when it is invoked at the transaction end.
> > > However, I am fine if we like this approach more.
> >
> > Thank you for updating the patch.
> >
> > The current patch looks much better than v10 which requires to call to
> > update_progress() every path.
> >
> > Regarding v15 patch, I'm concerned a bit that the new function name,
> > update_progress(), is too generic. How about
> > update_replation_progress() or something more specific name?
> >
>
> Do you intend to say update_replication_progress()? The word
> 'replation' doesn't make sense to me. I am fine with this suggestion.

Yeah, that was a typo. I meant update_replication_progress().

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-17 Thread Amit Kapila

On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila  wrote:
>
> On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira  wrote:
> >
> > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> >
> > Sawada-San, Euler, do you have any opinion on this approach? I
> > personally still prefer the approach implemented in v10 [1] especially
> > due to the latest finding by Wang-San that we can't update the
> > lag-tracker apart from when it is invoked at the transaction end.
> > However, I am fine if we like this approach more.
> >
> > It seems v15 is simpler and less error prone than v10. v10 has a mix of
> > OutputPluginUpdateProgress() and the new function update_progress(). The v10
> > also calls update_progress() for every change action in pgoutput_change(). 
> > It
> > is not a good approach for maintainability -- new changes like sequences 
> > need
> > extra calls.
> >
>
> Okay, let's use the v15 approach as Sawada-San also seems to have a
> preference for that.
>
> > However, as you mentioned there should handle the track lag case.
> >
> > Both patches change the OutputPluginUpdateProgress() so it cannot be
> > backpatched. Are you planning to backpatch it? If so, the boolean variable
> > (last_write or end_xacts depending of which version you are considering) 
> > could
> > be added to LogicalDecodingContext.
> >
>
> If we add it to LogicalDecodingContext then I think we have to always
> reset the variable after its use which will make it look ugly and
> error-prone. I was not thinking to backpatch it because of the API
> change but I guess if we want to backpatch then we can add it to
> LogicalDecodingContext for back-branches. I am not sure if that will
> look committable but surely we can try.
>

Even, if we want to add the variable in the struct in back-branches,
we need to ensure not to change the size of the struct as it is
exposed, see email [1] for a similar mistake we made in another case.

[1] - https://www.postgresql.org/message-id/2358496.1649168259%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-04-17 Thread wangw.f...@fujitsu.com

On Mon, Apr 18, 2022 at 00:35 PM Masahiko Sawada  wrote:
> On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila  wrote:
> >
> > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada 
> wrote:
> > >
> > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila 
> wrote:
> > > >
> > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > So I skip tracking lag during a transaction just like the current 
> > > > > HEAD.
> > > > > Attach the new patch.
> > > > >
> > > >
> > > > Thanks, please find the updated patch where I have slightly
> > > > modified the comments.
> > > >
> > > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > > personally still prefer the approach implemented in v10 [1]
> > > > especially due to the latest finding by Wang-San that we can't
> > > > update the lag-tracker apart from when it is invoked at the transaction 
> > > > end.
> > > > However, I am fine if we like this approach more.
> > >
> > > Thank you for updating the patch.
> > >
> > > The current patch looks much better than v10 which requires to call
> > > to
> > > update_progress() every path.
> > >
> > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > update_progress(), is too generic. How about
> > > update_replation_progress() or something more specific name?
> > >
> >
> > Do you intend to say update_replication_progress()? The word
> > 'replation' doesn't make sense to me. I am fine with this suggestion.
> 
> Yeah, that was a typo. I meant update_replication_progress().
Thanks for your comments.

> > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > update_progress(), is too generic. How about
> > > update_replation_progress() or something more specific name?
Improve as suggested. Change the name from update_progress to
update_replication_progress.

> > > ---
> > > +if (end_xact)
> > > +{
> > > +/* Update progress tracking at xact end. */
> > > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> > > +changes_count = 0;
> > > +return;
> > > +}
> > > +
> > > +/*
> > > + * After continuously processing CHANGES_THRESHOLD changes,
> > > we try to send
> > > + * a keepalive message if required.
> > > + *
> > > + * We don't want to try sending a keepalive message after
> > > processing each
> > > + * change as that can have overhead. Testing reveals that there 
> > > is no
> > > + * noticeable overhead in doing it after continuously
> > > processing 100 or so
> > > + * changes.
> > > + */
> > > +#define CHANGES_THRESHOLD 100
> > > +if (++changes_count >= CHANGES_THRESHOLD)
> > > +{
> > > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> > > +changes_count = 0;
> > > +}
> > > 
> > > Can we merge two if branches since we do the same things? Or did you
> > > separate them for better readability?
Improve as suggested. Merge two if-branches.

Attach the new patch.
1. Rename the new function(update_progress) to update_replication_progress. 
[suggestion by Sawada-San]
2. Merge two if-branches in new function update_replication_progress. 
[suggestion by Sawada-San.]
3. Improve comments to make them clear. [suggestions by Euler-San.]

Regards,
Wang wei


v16-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v16-0001-Fix-the-logical-replication-timeout-during-large.patch

RE: Logical replication timeout problem

2022-04-17 Thread wangw.f...@fujitsu.com

On Thur, Apr 14, 2022 at 8:21 PM Euler Taveira  wrote:
>
Thanks for your comments.

> + * For a large transaction, if we don't send any change to the downstream 
> for a
> + * long time then it can timeout. This can happen when all or most of the
> + * changes are either not published or got filtered out.
> 
> We should probable mention that "long time" is wal_receiver_timeout on
> subscriber.
Improve as suggested.
Add "(exceeds the wal_receiver_timeout of standby)" to explain what "long time"
means.

> +* change as that can have overhead. Testing reveals that there is no
> +* noticeable overhead in doing it after continuously processing 100 or so
> +* changes.
> 
> Tests revealed that ...
Improve as suggested.

> +* We don't have a mechanism to get the ack for any LSN other than end 
> xact
> +* lsn from the downstream. So, we track lag only for end xact lsn's.
> 
> s/lsn/LSN/ and s/lsn's/LSNs/
> 
> I would say "end of transaction LSN".
Improve as suggested.

> + * If too many changes are processed then try to send a keepalive message to
> + * receiver to avoid timeouts.
> 
> In logical replication, if too many changes are processed then try to send a
> keepalive message. It might avoid a timeout in the subscriber.
Improve as suggested.

Kindly have a look at new patch shared in [1].

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB627561344A2C7ECF68E41D7E9EF39%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

Re: Logical replication timeout problem

2022-04-18 Thread Masahiko Sawada

On Mon, Apr 18, 2022 at 3:16 PM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Apr 18, 2022 at 00:35 PM Masahiko Sawada  
> wrote:
> > On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila  wrote:
> > >
> > > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada 
> > wrote:
> > > >
> > > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila 
> > wrote:
> > > > >
> > > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> > > > >  wrote:
> > > > > >
> > > > > > So I skip tracking lag during a transaction just like the current 
> > > > > > HEAD.
> > > > > > Attach the new patch.
> > > > > >
> > > > >
> > > > > Thanks, please find the updated patch where I have slightly
> > > > > modified the comments.
> > > > >
> > > > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > > > personally still prefer the approach implemented in v10 [1]
> > > > > especially due to the latest finding by Wang-San that we can't
> > > > > update the lag-tracker apart from when it is invoked at the 
> > > > > transaction end.
> > > > > However, I am fine if we like this approach more.
> > > >
> > > > Thank you for updating the patch.
> > > >
> > > > The current patch looks much better than v10 which requires to call
> > > > to
> > > > update_progress() every path.
> > > >
> > > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > > update_progress(), is too generic. How about
> > > > update_replation_progress() or something more specific name?
> > > >
> > >
> > > Do you intend to say update_replication_progress()? The word
> > > 'replation' doesn't make sense to me. I am fine with this suggestion.
> >
> > Yeah, that was a typo. I meant update_replication_progress().
> Thanks for your comments.
>
> > > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > > update_progress(), is too generic. How about
> > > > update_replation_progress() or something more specific name?
> Improve as suggested. Change the name from update_progress to
> update_replication_progress.
>
> > > > ---
> > > > +if (end_xact)
> > > > +{
> > > > +/* Update progress tracking at xact end. */
> > > > +OutputPluginUpdateProgress(ctx, skipped_xact, 
> > > > end_xact);
> > > > +changes_count = 0;
> > > > +return;
> > > > +}
> > > > +
> > > > +/*
> > > > + * After continuously processing CHANGES_THRESHOLD changes,
> > > > we try to send
> > > > + * a keepalive message if required.
> > > > + *
> > > > + * We don't want to try sending a keepalive message after
> > > > processing each
> > > > + * change as that can have overhead. Testing reveals that 
> > > > there is no
> > > > + * noticeable overhead in doing it after continuously
> > > > processing 100 or so
> > > > + * changes.
> > > > + */
> > > > +#define CHANGES_THRESHOLD 100
> > > > +if (++changes_count >= CHANGES_THRESHOLD)
> > > > +{
> > > > +OutputPluginUpdateProgress(ctx, skipped_xact, 
> > > > end_xact);
> > > > +changes_count = 0;
> > > > +}
> > > >
> > > > Can we merge two if branches since we do the same things? Or did you
> > > > separate them for better readability?
> Improve as suggested. Merge two if-branches.
>
> Attach the new patch.
> 1. Rename the new function(update_progress) to update_replication_progress. 
> [suggestion by Sawada-San]
> 2. Merge two if-branches in new function update_replication_progress. 
> [suggestion by Sawada-San.]
> 3. Improve comments to make them clear. [suggestions by Euler-San.]

Thank you for updating the patch.

+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time(exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.

+ */
+ if(end_xact || ++changes_count >= CHANGES_THRESHOLD)
+ {

We need a whitespace before '(' at above two places. The rest looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Logical replication timeout problem

2022-04-18 Thread wangw.f...@fujitsu.com

On Mon, Apr 19, 2022 at 9:32 AM Masahiko Sawada  wrote:
> Thank you for updating the patch.
Thanks for your comments.

> + * For a large transaction, if we don't send any change to the
> + downstream for a
> + * long time(exceeds the wal_receiver_timeout of standby) then it can
> timeout.
> + * This can happen when all or most of the changes are either not
> + published or
> + * got filtered out.
> 
> + */
> + if(end_xact || ++changes_count >= CHANGES_THRESHOLD) {
> 
> We need a whitespace before '(' at above two places. The rest looks good to 
> me.
Fix these.

Attach the new patch.
1. Fix wrong formatting. [suggestion by Sawada-San]

Regards,
Wang wei


v17-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v17-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2023-01-05 Thread Ashutosh Bapat

Hi Wang,
Thanks for working on this. One of our customer faced a similar
situation when running BDR with PostgreSQL.

I tested your patch and it solves the problem.

Please find some review comments below

On Tue, Nov 8, 2022 at 8:34 AM wangw.f...@fujitsu.com
 wrote:
>
>
> Attach the patch.
>

+/*
+ * Helper function for ReorderBufferProcessTXN for updating progress.
+ */
+static inline void
+ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ReorderBufferChange *change)
+{
+LogicalDecodingContext *ctx = rb->private_data;
+static intchanges_count = 0;

It's not easy to know that a variable is static when reading the code which
uses it. So it's easy to interpret code wrong. I would probably track it
through logical decoding context itself OR through a global variable like other
places where we track the last timestamps. But there's more below on this.

+
+if (!ctx->update_progress)
+return;
+
+Assert(!ctx->fast_forward);
+
+/* set output state */
+ctx->accept_writes = false;
+ctx->write_xid = txn->xid;
+ctx->write_location = change->lsn;
+ctx->end_xact = false;

This patch reverts many of the changes of the previous commit which tried to
fix this issue i.e. 8df2374. end_xact was introduced by the same commit but
without much explanation of that in the commit message. Its only user,
WalSndUpdateProgress(), is probably making a wrong assumption as well.

 * We don't have a mechanism to get the ack for any LSN other than end
 * xact LSN from the downstream. So, we track lag only for end of
 * transaction LSN.

IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
sent downstream through a keep alive message. Downstream may acknowledge this
LSN. So we do get ack for any LSN, not just commit LSN.

So I propose removing end_xact as well.

+
+/*
+ * We don't want to try sending a keepalive message after processing each
+ * change as that can have overhead. Tests revealed that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100

I think a time based threashold makes more sense. What if the timeout was
nearing and those 100 changes just took little more time causing a timeout? We
already have a time based threashold in WalSndKeepaliveIfNecessary(). And that
function is invoked after reading every WAL record in WalSndLoop(). So it does
not look like it's an expensive function. If it is expensive we might want to
worry about WalSndLoop as well. Does it make more sense to remove this
threashold?

+
+/*
+ * After continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+if (++changes_count >= CHANGES_THRESHOLD)
+{
+ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
+changes_count = 0;
+}
+}
+

On the other thread, I mentioned that we don't have a TAP test for it.
I agree with
Amit's opinion there that it's hard to create a test which will timeout
everywhere. I think what we need is a way to control the time required for
decoding a transaction.

A rough idea is to induce a small sleep after decoding every change. The amount
of sleep * number of changes will help us estimate and control the amount of
time taken to decode a transaction. Then we create a transaction which will
take longer than the timeout threashold to decode. But that's a
significant code. I
don't think PostgreSQL has a facility to induce a delay at a particular place
in the code.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-08 Thread Amit Kapila

On Fri, Jan 6, 2023 at 12:35 PM Ashutosh Bapat
 wrote:
>
> +
> +/*
> + * We don't want to try sending a keepalive message after processing each
> + * change as that can have overhead. Tests revealed that there is no
> + * noticeable overhead in doing it after continuously processing 100 or 
> so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
>
> I think a time based threashold makes more sense. What if the timeout was
> nearing and those 100 changes just took little more time causing a timeout? We
> already have a time based threashold in WalSndKeepaliveIfNecessary(). And that
> function is invoked after reading every WAL record in WalSndLoop(). So it does
> not look like it's an expensive function. If it is expensive we might want to
> worry about WalSndLoop as well. Does it make more sense to remove this
> threashold?
>

We have previously tried this for every change [1] and it brings
noticeable overhead. In fact, even doing it for every 10 changes also
had some overhead which is why we reached this threshold number. I
don't think it can lead to timeout due to skipping changes but sure if
we see any such report we can further fine-tune this setting or will
try to make it time-based but for now I feel it would be safe to use
this threshold.

> +
> +/*
> + * After continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> +if (++changes_count >= CHANGES_THRESHOLD)
> +{
> +ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, 
> false);
> +changes_count = 0;
> +}
> +}
> +
>
> On the other thread, I mentioned that we don't have a TAP test for it.
> I agree with
> Amit's opinion there that it's hard to create a test which will timeout
> everywhere. I think what we need is a way to control the time required for
> decoding a transaction.
>
> A rough idea is to induce a small sleep after decoding every change. The 
> amount
> of sleep * number of changes will help us estimate and control the amount of
> time taken to decode a transaction. Then we create a transaction which will
> take longer than the timeout threashold to decode. But that's a
> significant code. I
> don't think PostgreSQL has a facility to induce a delay at a particular place
> in the code.
>

Yeah, I don't know how to induce such a delay while decoding changes.

One more thing, I think it would be better to expose a new callback
API via reorder buffer as suggested previously [2] similar to other
reorder buffer APIs instead of directly using reorderbuffer API to
invoke plugin API.


[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275DFFDAC7A59FA148931529E209%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1%2BfQjndoBOFUn9Wy0hhm3MLyUWEpcT9O7iuCELktfdBiQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2023-01-09 Thread wangw.f...@fujitsu.com

On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat  
wrote:
> Hi Wang,
> Thanks for working on this. One of our customer faced a similar
> situation when running BDR with PostgreSQL.
> 
> I tested your patch and it solves the problem.
> 
> Please find some review comments below

Thanks for your testing and comments.

> +/*
> + * Helper function for ReorderBufferProcessTXN for updating progress.
> + */
> +static inline void
> +ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
> +ReorderBufferChange *change)
> +{
> +LogicalDecodingContext *ctx = rb->private_data;
> +static intchanges_count = 0;
> 
> It's not easy to know that a variable is static when reading the code which
> uses it. So it's easy to interpret code wrong. I would probably track it
> through logical decoding context itself OR through a global variable like 
> other
> places where we track the last timestamps. But there's more below on this.

I'm not sure if we need to add global variables or member variables for a
cumulative count that is only used here. How would you feel if I add some
comments when declaring this static variable?

> +
> +if (!ctx->update_progress)
> +return;
> +
> +Assert(!ctx->fast_forward);
> +
> +/* set output state */
> +ctx->accept_writes = false;
> +ctx->write_xid = txn->xid;
> +ctx->write_location = change->lsn;
> +ctx->end_xact = false;
> 
> This patch reverts many of the changes of the previous commit which tried to
> fix this issue i.e. 8df2374. end_xact was introduced by the same commit 
> but
> without much explanation of that in the commit message. Its only user,
> WalSndUpdateProgress(), is probably making a wrong assumption as well.
> 
>  * We don't have a mechanism to get the ack for any LSN other than end
>  * xact LSN from the downstream. So, we track lag only for end of
>  * transaction LSN.
> 
> IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which 
> is
> sent downstream through a keep alive message. Downstream may
> acknowledge this
> LSN. So we do get ack for any LSN, not just commit LSN.
> 
> So I propose removing end_xact as well.

We didn't track the lag during a transaction because it could make the
calculations of lag functionality inaccurate. If we track every lsn, it could
fail to record important lsn information because of
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
Please see details in [1] and [2].

Regards,
Wang Wei

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/OS3PR01MB627514AE0B3040D8F55A68B99EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

RE: Logical replication timeout problem

2023-01-11 Thread wangw.f...@fujitsu.com

On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
>

Thanks for your comments.

> One more thing, I think it would be better to expose a new callback
> API via reorder buffer as suggested previously [2] similar to other
> reorder buffer APIs instead of directly using reorderbuffer API to
> invoke plugin API.

Yes, I agree. I think it would be better to add a new callback API on the HEAD.
So, I improved the fix approach:
Introduce a new optional callback to update the process. This callback function
is invoked at the end inside the main loop of the function
ReorderBufferProcessTXN() for each change. In this way, I think it seems that
similar timeout problems could be avoided.

BTW, I did the performance test for this patch. When running the SQL that
reproduces the problem (refresh the materialized view in sync logical
replication mode), the running time of new function pgoutput_update_progress is
less than 0.1% of the total time. I think this result looks OK.

Attach the new patch.

Regards,
Wang Wei

v2-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v2-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-16 Thread Ashutosh Bapat

On Mon, Jan 9, 2023 at 4:08 PM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat  
> wrote:
>
> I'm not sure if we need to add global variables or member variables for a
> cumulative count that is only used here. How would you feel if I add some
> comments when declaring this static variable?

I see WalSndUpdateProgress::sendTime is static already. So this seems
fine. A comment will help sure.

>
> > +
> > +if (!ctx->update_progress)
> > +return;
> > +
> > +Assert(!ctx->fast_forward);
> > +
> > +/* set output state */
> > +ctx->accept_writes = false;
> > +ctx->write_xid = txn->xid;
> > +ctx->write_location = change->lsn;
> > +ctx->end_xact = false;
> >
> > This patch reverts many of the changes of the previous commit which tried to
> > fix this issue i.e. 8df2374. end_xact was introduced by the same commit 
> > but
> > without much explanation of that in the commit message. Its only user,
> > WalSndUpdateProgress(), is probably making a wrong assumption as well.
> >
> >  * We don't have a mechanism to get the ack for any LSN other than end
> >  * xact LSN from the downstream. So, we track lag only for end of
> >  * transaction LSN.
> >
> > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr 
> > which is
> > sent downstream through a keep alive message. Downstream may
> > acknowledge this
> > LSN. So we do get ack for any LSN, not just commit LSN.
> >
> > So I propose removing end_xact as well.
>
> We didn't track the lag during a transaction because it could make the
> calculations of lag functionality inaccurate. If we track every lsn, it could
> fail to record important lsn information because of
> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
> Please see details in [1] and [2].

LagTrackerRead() interpolates to reduce the inaccuracy. I don't
understand why we need to track the end LSN only. But I don't think
that affects this fix. So I am fine if we want to leave end_xact
there.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-16 Thread Ashutosh Bapat

On Wed, Jan 11, 2023 at 4:11 PM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
> >
>
> Thanks for your comments.
>
> > One more thing, I think it would be better to expose a new callback
> > API via reorder buffer as suggested previously [2] similar to other
> > reorder buffer APIs instead of directly using reorderbuffer API to
> > invoke plugin API.
>
> Yes, I agree. I think it would be better to add a new callback API on the 
> HEAD.
> So, I improved the fix approach:
> Introduce a new optional callback to update the process. This callback 
> function
> is invoked at the end inside the main loop of the function
> ReorderBufferProcessTXN() for each change. In this way, I think it seems that
> similar timeout problems could be avoided.

I am a bit worried about the indirections that the wrappers and hooks
create. Output plugins call OutputPluginUpdateProgress() in callbacks
but I don't see why  ReorderBufferProcessTXN() needs a callback to
call OutputPluginUpdateProgress. I don't think output plugins are
going to do anything special with that callback than just call
OutputPluginUpdateProgress. Every output plugin will need to implement
it and if they do not they will face the timeout problem. That would
be unnecessary. Instead ReorderBufferUpdateProgress() in your first
patch was more direct and readable. That way the fix works for any
output plugin. In fact, I am wondering whether we could have a call in
ReorderBufferProcessTxn() at the end of transaction
(commit/prepare/commit prepared/abort prepared) instead of the
corresponding output plugin callbacks calling
OutputPluginUpdateProgress().

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-17 Thread Amit Kapila

On Mon, Jan 16, 2023 at 10:06 PM Ashutosh Bapat
 wrote:
>
> On Wed, Jan 11, 2023 at 4:11 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
> > >
> >
> > Thanks for your comments.
> >
> > > One more thing, I think it would be better to expose a new callback
> > > API via reorder buffer as suggested previously [2] similar to other
> > > reorder buffer APIs instead of directly using reorderbuffer API to
> > > invoke plugin API.
> >
> > Yes, I agree. I think it would be better to add a new callback API on the 
> > HEAD.
> > So, I improved the fix approach:
> > Introduce a new optional callback to update the process. This callback 
> > function
> > is invoked at the end inside the main loop of the function
> > ReorderBufferProcessTXN() for each change. In this way, I think it seems 
> > that
> > similar timeout problems could be avoided.
>
> I am a bit worried about the indirections that the wrappers and hooks
> create. Output plugins call OutputPluginUpdateProgress() in callbacks
> but I don't see why  ReorderBufferProcessTXN() needs a callback to
> call OutputPluginUpdateProgress.
>

Yeah, I think we can do it as we are doing the previous approach but
we need an additional wrapper (update_progress_cb_wrapper()) as the
current patch has so that we can add error context information. This
is similar to why we have a wrapper for all other callbacks like
change_cb_wrapper.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-17 Thread Ashutosh Bapat

On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  wrote:

> >
> > I am a bit worried about the indirections that the wrappers and hooks
> > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > call OutputPluginUpdateProgress.
> >
>
> Yeah, I think we can do it as we are doing the previous approach but
> we need an additional wrapper (update_progress_cb_wrapper()) as the
> current patch has so that we can add error context information. This
> is similar to why we have a wrapper for all other callbacks like
> change_cb_wrapper.
>

Ultimately OutputPluginUpdateProgress() will be called - which in turn
will call ctx->update_progress. I don't see wrappers around
OutputPluginWrite or OutputPluginPrepareWrite. But I see that those
two are called always from output plugin, so indirectly those are
called through a wrapper. I also see that update_progress_cb_wrapper()
is similar, as far as wrapper is concerned, to
ReorderBufferUpdateProgress() in the earlier patch.
ReorderBufferUpdateProgress() looks more readable than the wrapper.

If we want to keep the wrapper at least we should use a different
variable name. update_progress is also there LogicalDecodingContext
and will be indirectly called from ReorderBuffer::update_progress.
Somebody might think that there's some recursion involved there.
That's a mighty confusion.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-17 Thread Amit Kapila

On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
 wrote:
>
> On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  wrote:
>
> > >
> > > I am a bit worried about the indirections that the wrappers and hooks
> > > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > call OutputPluginUpdateProgress.
> > >
> >
> > Yeah, I think we can do it as we are doing the previous approach but
> > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > current patch has so that we can add error context information. This
> > is similar to why we have a wrapper for all other callbacks like
> > change_cb_wrapper.
> >
>
> Ultimately OutputPluginUpdateProgress() will be called - which in turn
> will call ctx->update_progress.
>

No, update_progress_cb_wrapper() should directly call
ctx->update_progress(). The key reason to have a
update_progress_cb_wrapper() is that it allows us to add error context
information (see the usage of output_plugin_error_callback).

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2023-01-18 Thread wangw.f...@fujitsu.com

On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila  wrote:
> On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
>  wrote:
> >
> > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  wrote:
> >
> > > >
> > > > I am a bit worried about the indirections that the wrappers and hooks
> > > > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > > call OutputPluginUpdateProgress.
> > > >
> > >
> > > Yeah, I think we can do it as we are doing the previous approach but
> > > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > > current patch has so that we can add error context information. This
> > > is similar to why we have a wrapper for all other callbacks like
> > > change_cb_wrapper.
> > >
> >
> > Ultimately OutputPluginUpdateProgress() will be called - which in turn
> > will call ctx->update_progress.
> >
> 
> No, update_progress_cb_wrapper() should directly call
> ctx->update_progress(). The key reason to have a
> update_progress_cb_wrapper() is that it allows us to add error context
> information (see the usage of output_plugin_error_callback).

I think it makes sense. This also avoids the need for every output plugin to
implement the callback. So I tried to improve the patch based on this approach.

And I tried to add some comments for this new callback to distinguish it from
ctx->update_progress.

Attach the new patch.

Regards,
Wang Wei


v3-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v3-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-18 Thread Ashutosh Bapat

On Wed, Jan 18, 2023 at 1:49 PM wangw.f...@fujitsu.com
 wrote:
>
> On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila  wrote:
> > On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  
> > > wrote:
> > >
> > > > >
> > > > > I am a bit worried about the indirections that the wrappers and hooks
> > > > > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > > > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > > > call OutputPluginUpdateProgress.
> > > > >
> > > >
> > > > Yeah, I think we can do it as we are doing the previous approach but
> > > > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > > > current patch has so that we can add error context information. This
> > > > is similar to why we have a wrapper for all other callbacks like
> > > > change_cb_wrapper.
> > > >
> > >
> > > Ultimately OutputPluginUpdateProgress() will be called - which in turn
> > > will call ctx->update_progress.
> > >
> >
> > No, update_progress_cb_wrapper() should directly call
> > ctx->update_progress(). The key reason to have a
> > update_progress_cb_wrapper() is that it allows us to add error context
> > information (see the usage of output_plugin_error_callback).
>
> I think it makes sense. This also avoids the need for every output plugin to
> implement the callback. So I tried to improve the patch based on this 
> approach.
>
> And I tried to add some comments for this new callback to distinguish it from
> ctx->update_progress.

Comments don't help when using cscope or some such code browsing tool.
Better to use a different variable name.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-18 Thread Amit Kapila

On Wed, Jan 18, 2023 at 5:37 PM Ashutosh Bapat
 wrote:
>
> On Wed, Jan 18, 2023 at 1:49 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila  
> > wrote:
> > > On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  
> > > > wrote:
> > > >
> > > > > >
> > > > > > I am a bit worried about the indirections that the wrappers and 
> > > > > > hooks
> > > > > > create. Output plugins call OutputPluginUpdateProgress() in 
> > > > > > callbacks
> > > > > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > > > > call OutputPluginUpdateProgress.
> > > > > >
> > > > >
> > > > > Yeah, I think we can do it as we are doing the previous approach but
> > > > > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > > > > current patch has so that we can add error context information. This
> > > > > is similar to why we have a wrapper for all other callbacks like
> > > > > change_cb_wrapper.
> > > > >
> > > >
> > > > Ultimately OutputPluginUpdateProgress() will be called - which in turn
> > > > will call ctx->update_progress.
> > > >
> > >
> > > No, update_progress_cb_wrapper() should directly call
> > > ctx->update_progress(). The key reason to have a
> > > update_progress_cb_wrapper() is that it allows us to add error context
> > > information (see the usage of output_plugin_error_callback).
> >
> > I think it makes sense. This also avoids the need for every output plugin to
> > implement the callback. So I tried to improve the patch based on this 
> > approach.
> >
> > And I tried to add some comments for this new callback to distinguish it 
> > from
> > ctx->update_progress.
>
> Comments don't help when using cscope or some such code browsing tool.
> Better to use a different variable name.
>

+ /*
+ * Callback to be called when updating progress during sending data of a
+ * transaction (and its subtransactions) to the output plugin.
+ */
+ ReorderBufferUpdateProgressCB update_progress;

Are you suggesting changing the name of the above variable? If so, how
about apply_progress, progress, or updateprogress? If you don't like
any of these then feel free to suggest something else. If we change
the variable name then accordingly, we need to update
ReorderBufferUpdateProgressCB as well.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-19 Thread Ashutosh Bapat

On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila  wrote:

> + */
> + ReorderBufferUpdateProgressCB update_progress;
>
> Are you suggesting changing the name of the above variable? If so, how
> about apply_progress, progress, or updateprogress? If you don't like
> any of these then feel free to suggest something else. If we change
> the variable name then accordingly, we need to update
> ReorderBufferUpdateProgressCB as well.
>

I would liked to have all the callback names renamed with prefix
"rbcb_xxx" so that they have very less chances of conflicting with
similar names in the code base. But it's probably late to do that :).

How are update_txn_progress since the CB is supposed to be used only
within a transaction? or update_progress_txn?
update_progress_cb_wrapper needs a change of name as well.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-19 Thread Amit Kapila

On Thu, Jan 19, 2023 at 4:13 PM Ashutosh Bapat
 wrote:
>
> On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila  wrote:
>
> > + */
> > + ReorderBufferUpdateProgressCB update_progress;
> >
> > Are you suggesting changing the name of the above variable? If so, how
> > about apply_progress, progress, or updateprogress? If you don't like
> > any of these then feel free to suggest something else. If we change
> > the variable name then accordingly, we need to update
> > ReorderBufferUpdateProgressCB as well.
> >
>
> I would liked to have all the callback names renamed with prefix
> "rbcb_xxx" so that they have very less chances of conflicting with
> similar names in the code base. But it's probably late to do that :).
>
> How are update_txn_progress since the CB is supposed to be used only
> within a transaction? or update_progress_txn?
>

Personally, I would prefer 'apply_progress' as it would be similar to
a few other callbacks like apply_change, apply_truncate, or as is
proposed by patch update_progress again because it is similar to
existing callbacks like commit_prepared. If you and others don't like
any of those then we can go for 'update_progress_txn' as well. Anybody
else has an opinion on this?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-19 Thread Peter Smith

Here are some review comments for patch v3-0001.

==
Commit message

1.
The problem is when there is a DDL in a transaction that generates lots of
temporary data due to rewrite rules, these temporary data will not be processed
by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
impact on this case.

~

1a.
IMO this comment needs to give a bit of background about the original
problem here, rather than just starting with "The problem is" which is
describing the flaws of the previous fix.

~

1b.
"pgoutput - plugin" -> "pgoutput plugin" ??

~~~

2.

To fix this, we introduced a new ReorderBuffer callback -
'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
process after each change has been processed during sending data of a
transaction (and its subtransactions) to the output plugin.

IIUC it's not really "after each change" - shouldn't this comment
mention something about the CHANGES_THRESHOLD 100?

==
src/backend/replication/logical/logical.c

3. forward declaration

+/* update progress callback */
+static void update_progress_cb_wrapper(ReorderBuffer *cache,
+ReorderBufferTXN *txn,
+ReorderBufferChange *change);

I felt this function wrapper name was a bit misleading... AFAIK every
other wrapper really does just wrap their respective functions. But
this one seems a bit different because it calls the wrapped function
ONLY if some threshold is exceeded. IMO maybe this function could have
some name that conveys this better:

e.g. update_progress_cb_wrapper_with_threshold

~~~

4. update_progress_cb_wrapper

+/*
+ * Update progress callback
+ *
+ * Try to update progress and send a keepalive message if too many changes were
+ * processed when processing txn.
+ *
+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time (exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.
+ */

SUGGESTION (instead of the "Try to update" sentence)
Send a keepalive message whenever more than 
changes are encountered while processing a transaction.

~~~

5.

+static void
+update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ReorderBufferChange *change)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ static int changes_count = 0; /* Static variable used to accumulate
+ * the number of changes while
+ * processing txn. */
+

IMO this may be more readable if the static 'changes_count' local var
was declared first and separated from the other vars by a blank line.

~~~

6.

+ /*
+ * We don't want to try sending a keepalive message after processing each
+ * change as that can have overhead. Tests revealed that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100

6a.
I think it might be better to define this right at the top of the
function adjacent to the 'changes_count' variable (e.g. a bit like the
original HEAD code looked)

~

6b.
SUGGESTION (for the comment)
Sending keepalive messages after every change has some overhead, but
testing showed there is no noticeable overhead if keepalive is only
sent after every ~100 changes.

~~~

7.

+
+ /*
+ * After continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
+ changes_count = 0;
+ }
+

7a.
SUGGESTION (for comment)
Send a keepalive message after every CHANGES_THRESHOLD changes.

~

7b.
Would it be neater to just call OutputPluginUpdateProgress here instead?

e.g.
BEFORE
ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
AFTER
OutputPluginUpdateProgress(ctx, false);

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Logical replication timeout problem

2023-01-19 Thread Amit Kapila

On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
>
> Here are some review comments for patch v3-0001.
>
> ==
> src/backend/replication/logical/logical.c
>
> 3. forward declaration
>
> +/* update progress callback */
> +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> +ReorderBufferTXN *txn,
> +ReorderBufferChange *change);
>
> I felt this function wrapper name was a bit misleading... AFAIK every
> other wrapper really does just wrap their respective functions. But
> this one seems a bit different because it calls the wrapped function
> ONLY if some threshold is exceeded. IMO maybe this function could have
> some name that conveys this better:
>
> e.g. update_progress_cb_wrapper_with_threshold
>

I am wondering whether it would be better to move the threshold logic
to the caller. Previously this logic was inside the function because
it was being invoked from multiple places but now that won't be the
case. Also, then your concern about the name would also be addressed.

>
> ~
>
> 7b.
> Would it be neater to just call OutputPluginUpdateProgress here instead?
>
> e.g.
> BEFORE
> ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> AFTER
> OutputPluginUpdateProgress(ctx, false);
>

We already check whether ctx->update_progress is defined or not which
is the only extra job done by OutputPluginUpdateProgress but probably
we can consolidate the checks and directly invoke
OutputPluginUpdateProgress.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-19 Thread Peter Smith

On Fri, Jan 20, 2023 at 3:35 PM Amit Kapila  wrote:
>
> On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
> >
> > Here are some review comments for patch v3-0001.
> >
> > ==
> > src/backend/replication/logical/logical.c
> >
> > 3. forward declaration
> >
> > +/* update progress callback */
> > +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> > +ReorderBufferTXN *txn,
> > +ReorderBufferChange *change);
> >
> > I felt this function wrapper name was a bit misleading... AFAIK every
> > other wrapper really does just wrap their respective functions. But
> > this one seems a bit different because it calls the wrapped function
> > ONLY if some threshold is exceeded. IMO maybe this function could have
> > some name that conveys this better:
> >
> > e.g. update_progress_cb_wrapper_with_threshold
> >
>
> I am wondering whether it would be better to move the threshold logic
> to the caller. Previously this logic was inside the function because
> it was being invoked from multiple places but now that won't be the
> case. Also, then your concern about the name would also be addressed.
>
> >
> > ~
> >
> > 7b.
> > Would it be neater to just call OutputPluginUpdateProgress here instead?
> >
> > e.g.
> > BEFORE
> > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> > AFTER
> > OutputPluginUpdateProgress(ctx, false);
> >
>
> We already check whether ctx->update_progress is defined or not which
> is the only extra job done by OutputPluginUpdateProgress but probably
> we can consolidate the checks and directly invoke
> OutputPluginUpdateProgress.
>

Yes, I saw that, but I thought it was better to keep the early exit
from update_progress_cb_wrapper, so incurring just one additional
boolean check for every 100 changes was not anything to worry about.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Thu, Jan 19, 2023 at 19:37 PM Amit Kapila  wrote:
> On Thu, Jan 19, 2023 at 4:13 PM Ashutosh Bapat
>  wrote:
> >
> > On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila 
> wrote:
> >
> > > + */
> > > + ReorderBufferUpdateProgressCB update_progress;
> > >
> > > Are you suggesting changing the name of the above variable? If so, how
> > > about apply_progress, progress, or updateprogress? If you don't like
> > > any of these then feel free to suggest something else. If we change
> > > the variable name then accordingly, we need to update
> > > ReorderBufferUpdateProgressCB as well.
> > >
> >
> > I would liked to have all the callback names renamed with prefix
> > "rbcb_xxx" so that they have very less chances of conflicting with
> > similar names in the code base. But it's probably late to do that :).
> >
> > How are update_txn_progress since the CB is supposed to be used only
> > within a transaction? or update_progress_txn?
> >
> 
> Personally, I would prefer 'apply_progress' as it would be similar to
> a few other callbacks like apply_change, apply_truncate, or as is
> proposed by patch update_progress again because it is similar to
> existing callbacks like commit_prepared. If you and others don't like
> any of those then we can go for 'update_progress_txn' as well. Anybody
> else has an opinion on this?

I think 'update_progress_txn' might be better. Because I think this name seems
to make it easier to know that this callback is used to update process when
processing txn. So, I rename it to 'update_progress_txn'.

I have addressed all the comments and here is the new version patch.

Regards,
Wang Wei


v4-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v4-0001-Fix-the-logical-replication-timeout-during-proces.patch

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Fri, Jan 20, 2023 at 12:35 PM Amit Kapila  wrote:
> On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
> >
> > Here are some review comments for patch v3-0001.
> >
> > ==
> > src/backend/replication/logical/logical.c
> >
> > 3. forward declaration
> >
> > +/* update progress callback */
> > +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> > +ReorderBufferTXN *txn,
> > +ReorderBufferChange *change);
> >
> > I felt this function wrapper name was a bit misleading... AFAIK every
> > other wrapper really does just wrap their respective functions. But
> > this one seems a bit different because it calls the wrapped function
> > ONLY if some threshold is exceeded. IMO maybe this function could have
> > some name that conveys this better:
> >
> > e.g. update_progress_cb_wrapper_with_threshold
> >
> 
> I am wondering whether it would be better to move the threshold logic
> to the caller. Previously this logic was inside the function because
> it was being invoked from multiple places but now that won't be the
> case. Also, then your concern about the name would also be addressed.

Agree. Moved the threshold logic to the function ReorderBufferProcessTXN.

> >
> > ~
> >
> > 7b.
> > Would it be neater to just call OutputPluginUpdateProgress here instead?
> >
> > e.g.
> > BEFORE
> > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> > AFTER
> > OutputPluginUpdateProgress(ctx, false);
> >
> 
> We already check whether ctx->update_progress is defined or not which
> is the only extra job done by OutputPluginUpdateProgress but probably
> we can consolidate the checks and directly invoke
> OutputPluginUpdateProgress.

Changed. Invoke the function OutputPluginUpdateProgress directly in the new
callback.

Regards,
Wang Wei

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Fri, Jan 20, 2023 at 10:10 AM Peter Smith  wrote:
> Here are some review comments for patch v3-0001.

Thanks for your comments.

> ==
> Commit message
> 
> 1.
> The problem is when there is a DDL in a transaction that generates lots of
> temporary data due to rewrite rules, these temporary data will not be
> processed
> by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
> impact on this case.
> 
> ~
> 
> 1a.
> IMO this comment needs to give a bit of background about the original
> problem here, rather than just starting with "The problem is" which is
> describing the flaws of the previous fix.

Added some related message.

> ~
> 
> 1b.
> "pgoutput - plugin" -> "pgoutput plugin" ??

Changed.

> ~~~
> 
> 2.
> 
> To fix this, we introduced a new ReorderBuffer callback -
> 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
> process after each change has been processed during sending data of a
> transaction (and its subtransactions) to the output plugin.
> 
> IIUC it's not really "after each change" - shouldn't this comment
> mention something about the CHANGES_THRESHOLD 100?

Changed.

> ~~~
> 
> 4. update_progress_cb_wrapper
> 
> +/*
> + * Update progress callback
> + *
> + * Try to update progress and send a keepalive message if too many changes
> were
> + * processed when processing txn.
> + *
> + * For a large transaction, if we don't send any change to the downstream 
> for a
> + * long time (exceeds the wal_receiver_timeout of standby) then it can
> timeout.
> + * This can happen when all or most of the changes are either not published 
> or
> + * got filtered out.
> + */
> 
> SUGGESTION (instead of the "Try to update" sentence)
> Send a keepalive message whenever more than 
> changes are encountered while processing a transaction.

Since it's possible that keep-alive messages won't be sent even if the
threshold is reached (see function WalSndKeepaliveIfNecessary), I thought it
might be better to use "try to".
And rewrote the comments here because the threshold logic is moved to the
function ReorderBufferProcessTXN.

> ~~~
> 
> 5.
> 
> +static void
> +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
> +ReorderBufferChange *change)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> + static int changes_count = 0; /* Static variable used to accumulate
> + * the number of changes while
> + * processing txn. */
> +
> 
> IMO this may be more readable if the static 'changes_count' local var
> was declared first and separated from the other vars by a blank line.

Changed.

> ~~~
> 
> 6.
> 
> + /*
> + * We don't want to try sending a keepalive message after processing each
> + * change as that can have overhead. Tests revealed that there is no
> + * noticeable overhead in doing it after continuously processing 100 or so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
> 
> 6a.
> I think it might be better to define this right at the top of the
> function adjacent to the 'changes_count' variable (e.g. a bit like the
> original HEAD code looked)

Changed.

> ~
> 
> 6b.
> SUGGESTION (for the comment)
> Sending keepalive messages after every change has some overhead, but
> testing showed there is no noticeable overhead if keepalive is only
> sent after every ~100 changes.

Changed.

> ~~~
> 
> 7.
> 
> +
> + /*
> + * After continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> + changes_count = 0;
> + }
> +
> 
> 7a.
> SUGGESTION (for comment)
> Send a keepalive message after every CHANGES_THRESHOLD changes.

Changed.

Regards,
Wang Wei

Re: Logical replication timeout problem

2023-01-22 Thread Peter Smith

Here are my review comments for patch v4-0001

==
General

1.

It makes no real difference, but I was wondering about:
"update txn progress" versus "update progress txn"

I thought that the first way sounds more natural. YMMV.

If you change this then there is impact for the typedef, function
names, comments, member names:

ReorderBufferUpdateTxnProgressCB -->  ReorderBufferUpdateProgressTxnCB

“/* update progress txn callback */” --> “/* update txn progress callback */”

update_progress_txn_cb_wrapper -->  update_txn_progress_cb_wrapper

updated_progress_txn --> update_txn_progress

==
Commit message

2.

The problem is when there is a DDL in a transaction that generates lots of
temporary data due to rewrite rules, these temporary data will not be processed
by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts
caused by filtering out changes in pgoutput. Therefore, the previous fix for
DML had no impact on this case.

~

IMO this still some rewording to say up-front what the the actual
problem -- i.e. an avoidable timeout occuring.

SUGGESTION (or something like this...)

When there is a DDL in a transaction that generates lots of temporary
data due to rewrite rules, this temporary data will not be processed
by the pgoutput plugin. This means it is possible for a timeout to
occur if a sufficiently long time elapses since the last pgoutput
message. A previous commit (f95d53e) fixed a similar scenario in this
area, but that only fixed timeouts for DML going through pgoutput, so
it did not address this DDL timeout case.

==
src/backend/replication/logical/logical.c

3. update_progress_txn_cb_wrapper

+/*
+ * Update progress callback while processing a transaction.
+ *
+ * Try to update progress and send a keepalive message during sending data of a
+ * transaction (and its subtransactions) to the output plugin.
+ *
+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time (exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.
+ */
+static void
+update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ReorderBufferChange *change)

Simplify the "Try to..." paragraph. And other part should also mention
about DDL.

SUGGESTION

Try send a keepalive message during transaction processing.

This is done because if we don't send any change to the downstream for
a long time (exceeds the wal_receiver_timeout of standby), then it can
timeout. This can happen for large DDL, or for large transactions when
all or most of the changes are either not published or got filtered
out.

==
.../replication/logical/reorderbuffer.c

4. ReorderBufferProcessTXN

@@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,

  PG_TRY();
  {
+ /*
+ * Static variable used to accumulate the number of changes while
+ * processing txn.
+ */
+ static int changes_count = 0;
+
+ /*
+ * Sending keepalive messages after every change has some overhead, but
+ * testing showed there is no noticeable overhead if keepalive is only
+ * sent after every ~100 changes.
+ */
+#define CHANGES_THRESHOLD 100
+

IMO these can be relocated to be declared/defined inside the "while"
loop -- i.e. closer to where they are being used.

~~~

5.

+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ rb->update_progress_txn(rb, txn, change);
+ changes_count = 0;
+ }

When there is no update_progress function this code is still incurring
some small additional overhead for incrementing and testing the
THRESHOLD every time, and also needlessly calling to the wrapper every
100x. This overhead could be avoided with a simpler up-front check
like shown below. OTOH, maybe the overhead is insignificant enough
that just leaving the curent code is neater?

LogicalDecodingContext *ctx = rb->private_data;
...
if (ctx->update_progress_txn && (++changes_count >= CHANGES_THRESHOLD))
{
rb->update_progress_txn(rb, txn, change);
changes_count = 0;
}

--
Kind Reagrds,
Peter Smith.
Fujitsu Australia

Re: Logical replication timeout problem

2023-01-22 Thread Amit Kapila

On Mon, Jan 23, 2023 at 6:21 AM Peter Smith  wrote:
>
> 1.
>
> It makes no real difference, but I was wondering about:
> "update txn progress" versus "update progress txn"
>

Yeah, I think we can go either way but I still prefer "update progress
txn" as that is more closer to LogicalOutputPluginWriterUpdateProgress
callback name.

>
> 5.
>
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + rb->update_progress_txn(rb, txn, change);
> + changes_count = 0;
> + }
>
> When there is no update_progress function this code is still incurring
> some small additional overhead for incrementing and testing the
> THRESHOLD every time, and also needlessly calling to the wrapper every
> 100x. This overhead could be avoided with a simpler up-front check
> like shown below. OTOH, maybe the overhead is insignificant enough
> that just leaving the curent code is neater?
>

As far as built-in logical replication is concerned, it will be
defined and I don't know if the overhead will be significant enough in
this case. Also, one can say that for the cases it is defined, we are
adding this check multiple times (it is already checked inside
OutputPluginUpdateProgress). So, I would prefer a neat code here.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2023-01-23 Thread houzj.f...@fujitsu.com

On Monday, January 23, 2023 8:51 AM Peter Smith  wrote:
> 
> Here are my review comments for patch v4-0001
> ==
> Commit message
> 
> 2.
> 
> The problem is when there is a DDL in a transaction that generates lots of
> temporary data due to rewrite rules, these temporary data will not be 
> processed
> by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts
> caused by filtering out changes in pgoutput. Therefore, the previous fix for 
> DML
> had no impact on this case.
> 
> ~
> 
> IMO this still some rewording to say up-front what the the actual problem -- 
> i.e.
> an avoidable timeout occuring.
> 
> SUGGESTION (or something like this...)
> 
> When there is a DDL in a transaction that generates lots of temporary data due
> to rewrite rules, this temporary data will not be processed by the pgoutput
> plugin. This means it is possible for a timeout to occur if a sufficiently 
> long time
> elapses since the last pgoutput message. A previous commit (f95d53e) fixed a
> similar scenario in this area, but that only fixed timeouts for DML going 
> through
> pgoutput, so it did not address this DDL timeout case.

Thanks, I changed the commit message as suggested.

> ==
> src/backend/replication/logical/logical.c
> 
> 3. update_progress_txn_cb_wrapper
> 
> +/*
> + * Update progress callback while processing a transaction.
> + *
> + * Try to update progress and send a keepalive message during sending
> +data of a
> + * transaction (and its subtransactions) to the output plugin.
> + *
> + * For a large transaction, if we don't send any change to the
> +downstream for a
> + * long time (exceeds the wal_receiver_timeout of standby) then it can 
> timeout.
> + * This can happen when all or most of the changes are either not
> +published or
> + * got filtered out.
> + */
> +static void
> +update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN
> *txn,
> +ReorderBufferChange *change)
> 
> Simplify the "Try to..." paragraph. And other part should also mention about 
> DDL.
> 
> SUGGESTION
> 
> Try send a keepalive message during transaction processing.
> 
> This is done because if we don't send any change to the downstream for a long
> time (exceeds the wal_receiver_timeout of standby), then it can timeout. This 
> can
> happen for large DDL, or for large transactions when all or most of the 
> changes
> are either not published or got filtered out.

Changed.

> ==
> .../replication/logical/reorderbuffer.c
> 
> 4. ReorderBufferProcessTXN
> 
> @@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
> 
>   PG_TRY();
>   {
> + /*
> + * Static variable used to accumulate the number of changes while
> + * processing txn.
> + */
> + static int changes_count = 0;
> +
> + /*
> + * Sending keepalive messages after every change has some overhead, but
> + * testing showed there is no noticeable overhead if keepalive is only
> + * sent after every ~100 changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +
> 
> IMO these can be relocated to be declared/defined inside the "while"
> loop -- i.e. closer to where they are being used.

Moved into the while loop.

Attach the new version patch which addressed above comments.
Also attach a simple script which use "refresh matview" to reproduce
this timeout problem just in case some one want to try to reproduce this.

Best regards,
Hou zj


test.sh
Description: test.sh


v5-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v5-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-23 Thread Peter Smith

Hi Hou-san, Here are my review comments for v5-0001.

==
src/backend/replication/logical/reorderbuffer.c

1.
@@ -2446,6 +2452,23 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
  elog(ERROR, "tuplecid value in changequeue");
  break;
  }
+
+ /*
+ * Sending keepalive messages after every change has some overhead, but
+ * testing showed there is no noticeable overhead if keepalive is only
+ * sent after every ~100 changes.
+ */
+#define CHANGES_THRESHOLD 100
+
+ /*
+ * Try to send a keepalive message after every CHANGES_THRESHOLD
+ * changes.
+ */
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ rb->update_progress_txn(rb, txn, change);
+ changes_count = 0;
+ }

I noticed you put the #define adjacent to the only usage of it,
instead of with the other variable declaration like it was before.
Probably it is better how you have done it, but:

1a.
The comment indentation is incorrect.

~

1b.
Since the #define is adjacent to its only usage IMO now the 2nd
comment is redundant. So the code can just say

   /*
* Sending keepalive messages after every change has some
overhead, but
* testing showed there is no noticeable overhead if
keepalive is only
* sent after every ~100 changes.
*/
#define CHANGES_THRESHOLD 100
if (++changes_count >= CHANGES_THRESHOLD)
{
rb->update_progress_txn(rb, txn, change);
changes_count = 0;
}

--
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Logical replication timeout problem

2023-01-23 Thread wangw.f...@fujitsu.com

On Tues, Jan 24, 2023 at 8:28 AM Peter Smith  wrote:
> Hi Hou-san, Here are my review comments for v5-0001.

Thanks for your comments.

> ==
> src/backend/replication/logical/reorderbuffer.c
> 
> 1.
> @@ -2446,6 +2452,23 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
>   elog(ERROR, "tuplecid value in changequeue");
>   break;
>   }
> +
> + /*
> + * Sending keepalive messages after every change has some overhead, but
> + * testing showed there is no noticeable overhead if keepalive is only
> + * sent after every ~100 changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +
> + /*
> + * Try to send a keepalive message after every CHANGES_THRESHOLD
> + * changes.
> + */
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + rb->update_progress_txn(rb, txn, change);
> + changes_count = 0;
> + }
> 
> I noticed you put the #define adjacent to the only usage of it,
> instead of with the other variable declaration like it was before.
> Probably it is better how you have done it, but:
> 
> 1a.
> The comment indentation is incorrect.
> 
> ~
> 
> 1b.
> Since the #define is adjacent to its only usage IMO now the 2nd
> comment is redundant. So the code can just say
> 
>/*
> * Sending keepalive messages after every change has some
> overhead, but
> * testing showed there is no noticeable overhead if
> keepalive is only
> * sent after every ~100 changes.
> */
> #define CHANGES_THRESHOLD 100
> if (++changes_count >= CHANGES_THRESHOLD)
> {
> rb->update_progress_txn(rb, txn, change);
> changes_count = 0;
> }

Changed as suggested.

Attach the new patch.

Regards,
Wang Wei


v6-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v6-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-23 Thread Peter Smith

On Tue, Jan 24, 2023 at 1:45 PM wangw.f...@fujitsu.com
 wrote:
>
> On Tues, Jan 24, 2023 at 8:28 AM Peter Smith  wrote:
> > Hi Hou-san, Here are my review comments for v5-0001.
>
> Thanks for your comments.
...
>
> Changed as suggested.
>
> Attach the new patch.

Thanks! Patch v6 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Logical replication timeout problem

2023-01-25 Thread Amit Kapila

On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
 wrote:
>
> Attach the new patch.
>

I think the patch missed to handle the case of non-transactional
messages which was previously getting handled. I have tried to address
that in the attached. Is there a reason that shouldn't be handled?
Apart from that changed a few comments. If my understanding is
correct, then we need to change the callback update_progress_txn name
as well because now it needs to handle both transactional and
non-transactional changes. How about update_progress_write? We
accordingly need to change the comments for the callback.

Additionally, I think we should have a test case to show we don't time
out because of not processing non-transactional messages. See
pgoutput_message for cases where it doesn't process the message.

-- 
With Regards,
Amit Kapila.

v7-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description: Binary data

RE: Logical replication timeout problem

2023-01-27 Thread houzj.f...@fujitsu.com

On Wednesday, January 25, 2023 7:26 PM Amit Kapila 
> 
> On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > Attach the new patch.
> >
> 
> I think the patch missed to handle the case of non-transactional messages 
> which
> was previously getting handled. I have tried to address that in the attached. 
> Is
> there a reason that shouldn't be handled?

Thanks for updating the patch!

I thought about the non-transactional message. I think it seems fine if we
don’t handle it for timeout because such message is decoded via:

WalSndLoop
-XLogSendLogical
--LogicalDecodingProcessRecord
---logicalmsg_decode
ReorderBufferQueueMessage
-rb->message() -- //maybe send the message or do nothing here.

After invoking rb->message(), we will directly return to the main
loop(WalSndLoop) where we will get a chance to call
WalSndKeepaliveIfNecessary() to avoid the timeout.

This is a bit different from transactional changes, because for transactional 
changes, we
will buffer them and then send every buffered change one by one(via
ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get
a chance to send keepalive message if necessary, which is more likely to cause 
the
timeout problem.

I will also test the non-transactional message for timeout in case I missed 
something.

Best Regards,
Hou zj

Re: Logical replication timeout problem

2023-01-27 Thread Amit Kapila

On Fri, Jan 27, 2023 at 5:18 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, January 25, 2023 7:26 PM Amit Kapila 
> >
> > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > Attach the new patch.
> > >
> >
> > I think the patch missed to handle the case of non-transactional messages 
> > which
> > was previously getting handled. I have tried to address that in the 
> > attached. Is
> > there a reason that shouldn't be handled?
>
> Thanks for updating the patch!
>
> I thought about the non-transactional message. I think it seems fine if we
> don’t handle it for timeout because such message is decoded via:
>
> WalSndLoop
> -XLogSendLogical
> --LogicalDecodingProcessRecord
> ---logicalmsg_decode
> ReorderBufferQueueMessage
> -rb->message() -- //maybe send the message or do nothing here.
>
> After invoking rb->message(), we will directly return to the main
> loop(WalSndLoop) where we will get a chance to call
> WalSndKeepaliveIfNecessary() to avoid the timeout.
>

Valid point. But this means the previous handling of non-transactional
messages was also redundant.

> This is a bit different from transactional changes, because for transactional 
> changes, we
> will buffer them and then send every buffered change one by one(via
> ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get
> a chance to send keepalive message if necessary, which is more likely to 
> cause the
> timeout problem.
>
> I will also test the non-transactional message for timeout in case I missed 
> something.
>

Okay, thanks. Please see if we can test a mix of transactional and
non-transactional messages as well.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2023-01-28 Thread wangw.f...@fujitsu.com

On Fri, Jan 27, 2023 at 19:55 PM Amit Kapila  wrote:
> On Fri, Jan 27, 2023 at 5:18 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Wednesday, January 25, 2023 7:26 PM Amit Kapila
> 
> > >
> > > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > Attach the new patch.
> > > >
> > >
> > > I think the patch missed to handle the case of non-transactional messages
> which
> > > was previously getting handled. I have tried to address that in the 
> > > attached.
> Is
> > > there a reason that shouldn't be handled?
> >
> > Thanks for updating the patch!
> >
> > I thought about the non-transactional message. I think it seems fine if we
> > don’t handle it for timeout because such message is decoded via:
> >
> > WalSndLoop
> > -XLogSendLogical
> > --LogicalDecodingProcessRecord
> > ---logicalmsg_decode
> > ReorderBufferQueueMessage
> > -rb->message() -- //maybe send the message or do nothing here.
> >
> > After invoking rb->message(), we will directly return to the main
> > loop(WalSndLoop) where we will get a chance to call
> > WalSndKeepaliveIfNecessary() to avoid the timeout.
> >
> 
> Valid point. But this means the previous handling of non-transactional
> messages was also redundant.

Thanks for the analysis, I think it makes sense. So I removed the handling of
non-transactional messages.

> > This is a bit different from transactional changes, because for 
> > transactional
> changes, we
> > will buffer them and then send every buffered change one by one(via
> > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we
> don't get
> > a chance to send keepalive message if necessary, which is more likely to 
> > cause
> the
> > timeout problem.
> >
> > I will also test the non-transactional message for timeout in case I missed
> something.
> >
> 
> Okay, thanks. Please see if we can test a mix of transactional and
> non-transactional messages as well.

I tested a mix transaction of transactional and non-transactional messages on
the current HEAD and reproduced the timeout problem. I think this result is OK.
Because when decoding a transaction, non-transactional changes are processed
directly and the function WalSndKeepaliveIfNecessary is called, while
transactional changes are cached and processed after decoding. After decoding,
only transactional changes will be processed (in the function
ReorderBufferProcessTXN), so the timeout problem will still be reproduced.

After applying the v8 patch, the test mentioned above didn't reproduce the
timeout problem (Attach this test script 'test_with_nontransactional.sh').

Attach the new patch.

Regards,
Wang Wei


v8-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v8-0001-Fix-the-logical-replication-timeout-during-proces.patch


test_with_nontransactional.sh
Description: test_with_nontransactional.sh

RE: Logical replication timeout problem

2023-01-29 Thread shiy.f...@fujitsu.com

On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com  
wrote:
> 
> I tested a mix transaction of transactional and non-transactional messages on
> the current HEAD and reproduced the timeout problem. I think this result is 
> OK.
> Because when decoding a transaction, non-transactional changes are processed
> directly and the function WalSndKeepaliveIfNecessary is called, while
> transactional changes are cached and processed after decoding. After decoding,
> only transactional changes will be processed (in the function
> ReorderBufferProcessTXN), so the timeout problem will still be reproduced.
> 
> After applying the v8 patch, the test mentioned above didn't reproduce the
> timeout problem (Attach this test script 'test_with_nontransactional.sh').
> 
> Attach the new patch.
> 

Thanks for updating the patch. Here is a comment.

In update_progress_txn_cb_wrapper(), it looks like we need to reset
changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise
OutputPluginUpdateProgress() will always be called after 100 changes.

Regards,
Shi yu

1 2 >

1 - 100 of 197 matches

Mail list logo