I keep your patch 0001 and I add these two calls in function WalSndUpdateProgress without modifying WalSndKeepaliveIfNecessary, it works too. What do your think of this patch?
static void WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid) { static TimestampTz sendTime = 0; TimestampTz now = GetCurrentTimestamp(); ProcessRepliesIfAny(); WalSndKeepaliveIfNecessary(); /* * Track lag no more than once per WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS to * avoid flooding the lag tracker when we commit frequently. */ ... Regards Fabrice On Fri, Jan 21, 2022 at 2:17 PM Fabrice Chapuis <fabrice636...@gmail.com> wrote: > Thanks for your patch, it also works well when executing our use case, the > timeout no longer appears in the logs. Is it necessary now to refine this > patch and make as few changes as possible in order for it to be released? > > On Fri, Jan 21, 2022 at 10:51 AM wangw.f...@fujitsu.com < > wangw.f...@fujitsu.com> wrote: > >> On Thu, Jan 20, 2022 at 9:18 PM Amit Kapila <amit.kapil...@gmail.com> >> wrote: >> > It might be not reaching the actual send_keep_alive logic in >> > WalSndKeepaliveIfNecessary because of below code: >> > { >> > ... >> > /* >> > * Don't send keepalive messages if timeouts are globally disabled or >> > * we're doing something not partaking in timeouts. >> > */ >> > if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0) return; .. >> > } >> > >> > I think you can add elog before the above return and before updating >> progress >> > in the below code: >> > case REORDER_BUFFER_CHANGE_INSERT: >> > if (!relentry->pubactions.pubinsert) >> > + { >> > + OutputPluginUpdateProgress(ctx); >> > return; >> > >> > This will help us to rule out one possibility. >> >> Thanks for your advices! >> >> According to your advices, I applied 0001,0002 and 0003 to run the test >> script. >> When subscriber timeout, I filter publisher-side log: >> $ grep "before invoking update progress" pub.log | wc -l >> 60373557 >> $ grep "return because wal_sender_timeout or last_reply_timestamp" >> pub.log | wc -l >> 0 >> $ grep "return because waiting_for_ping_response" pub.log | wc -l >> 0 >> >> Based on this result, I think function WalSndKeepaliveIfNecessary was >> invoked, >> but function WalSndKeepalive was not invoked because (last_processing >= >> ping_time) is false. >> So I tried to see changes about last_processing and last_reply_timestamp >> (because ping_time is based on last_reply_timestamp). >> >> I found last_processing and last_reply_timestamp is set in function >> ProcessRepliesIfAny. >> last_processing is set to the time when function ProcessRepliesIfAny is >> invoked. >> Only when publisher receive a response from subscriber, >> last_reply_timestamp is >> set to last_processing and the flag waiting_for_ping_response is reset to >> false. >> >> When we are during the loop to skip all the changes of transaction, IIUC, >> we do >> not invoke function ProcessRepliesIfAny. So I think last_processing and >> last_reply_timestamp will not be changed in this loop. >> Therefore I think about our use case, we should modify the condition of >> invoking WalSndKeepalive.(please refer to >> 0004-Simple-modification-of-timing.patch, and note that this is only a >> patch >> for testing). >> At the same time I modify the input of WalSndKeepalive from true to >> false. This >> is because when input is true, waiting_for_ping_response is set to true in >> WalSndKeepalive. As mentioned above, ProcessRepliesIfAny is not invoked >> in the >> loop, so I think waiting_for_ping_response will not be reset to false and >> keepalive messages will not be sent. >> >> I tested after applying patches(0001 and 0004), I found the timeout was >> not >> printed in subscriber-side log. And the added messages "begin load >> changes" and >> "commit the log" were printed in publisher-side log: >> $ grep -ir "begin load changes" pub.log >> 2022-01-21 11:17:06.934 CST [2577699] LOG: begin load changes >> $ grep -ir "commit the log" pub.log >> 2022-01-21 11:21:15.564 CST [2577699] LOG: commit the log >> >> Attach the patches and test script mentioned above, in case someone wants >> to >> try. >> >> Regards, >> Wang wei >> >