Re: Logical replication keepalive flood

Greg Nancarrow Fri, 17 Sep 2021 02:33:23 -0700

On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> I think here the reason is that the first_lsn of a transaction is
> always equal to end_lsn of the previous transaction (See comments
> above first_lsn and end_lsn fields of ReorderBufferTXN).


That may be the case, but those comments certainly don't make this clear.

>I have not
> debugged but I think in StreamLogicalLog() the cur_record_lsn after
> receiving 'w' message, in this case, will be equal to endpos whereas
> we expect to be greater than endpos to exit. Before the patch, it will
> always get the 'k' message where we expect the received lsn to be
> equal to endpos to conclude that we can exit. Do let me know if your
> analysis differs?
>

Yes, pg_recvlogical seems to be relying on receiving a keepalive for
its "--endpos" logic to work (and the 006 test is relying on '' record
output from pg_recvlogical in this case).
But is it correct to be relying on a keepalive for this?
As I already pointed out, there's also code which seems to be relying
on replies from sending keepalives, to update flush and write
locations related to LSN.
The original problem reporter measured 500 keepalives per second being
sent by walsender (which I also reproduced, for pg_recvlogical and
pub/sub cases).
None of these cases appear to be traditional uses of "keepalive" type
messages to me.
Am I missing something? Documentation?


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Logical replication keepalive flood

Reply via email to