On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > I think here the reason is that the first_lsn of a transaction is > always equal to end_lsn of the previous transaction (See comments > above first_lsn and end_lsn fields of ReorderBufferTXN).
That may be the case, but those comments certainly don't make this clear. >I have not > debugged but I think in StreamLogicalLog() the cur_record_lsn after > receiving 'w' message, in this case, will be equal to endpos whereas > we expect to be greater than endpos to exit. Before the patch, it will > always get the 'k' message where we expect the received lsn to be > equal to endpos to conclude that we can exit. Do let me know if your > analysis differs? > Yes, pg_recvlogical seems to be relying on receiving a keepalive for its "--endpos" logic to work (and the 006 test is relying on '' record output from pg_recvlogical in this case). But is it correct to be relying on a keepalive for this? As I already pointed out, there's also code which seems to be relying on replies from sending keepalives, to update flush and write locations related to LSN. The original problem reporter measured 500 keepalives per second being sent by walsender (which I also reproduced, for pg_recvlogical and pub/sub cases). None of these cases appear to be traditional uses of "keepalive" type messages to me. Am I missing something? Documentation? Regards, Greg Nancarrow Fujitsu Australia