On Tue, Mar 8, 2022 at 3:52 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > I've looked at the patch and have a question: Thanks for your review and comments.
> +void > +SendKeepaliveIfNecessary(LogicalDecodingContext *ctx, bool skipped) { > + static int skipped_changes_count = 0; > + > + /* > + * skipped_changes_count is reset when processing changes that do not > + * need to be skipped. > + */ > + if (!skipped) > + { > + skipped_changes_count = 0; > + return; > + } > + > + /* > + * After continuously skipping SKIPPED_CHANGES_THRESHOLD > changes, try to send a > + * keepalive message. > + */ > + #define SKIPPED_CHANGES_THRESHOLD 10000 > + > + if (++skipped_changes_count >= SKIPPED_CHANGES_THRESHOLD) > + { > + /* Try to send a keepalive message. */ > + OutputPluginUpdateProgress(ctx, true); > + > + /* After trying to send a keepalive message, reset the flag. > */ > + skipped_changes_count = 0; > + } > +} > > Since we send a keepalive after continuously skipping 10000 changes, the > originally reported issue can still occur if skipping 10000 changes took more > than > the timeout and the walsender didn't send any change while that, is that > right? Yes, theoretically so. But after testing, I think this value should be conservative enough not to reproduce this bug. After the previous discussion[1], it is currently considered that it is better to directly set a conservative threshold than to calculate the threshold based on wal_sender_timeout. [1] - https://www.postgresql.org/message-id/OS3PR01MB6275FEB9F83081F1C87539B99E019%40OS3PR01MB6275.jpnprd01.prod.outlook.com Regards, Wang wei