On Tuesday, December 6, 2022 5:00 PM Peter Smith <smithpb2...@gmail.com> wrote: > Here are some review comments for patch v9-0001: Hi, thank you for your reviews !
> > ====== > > GENERAL > > 1. min_ prefix? > > What's the significance of the "min_" prefix for this parameter? I'm guessing > the > background is that at one time it was considered to be a GUC so took a name > similar to GUC recovery_min_apply_delay (??) > > But in practice, I think it is meaningless and/or misleading. For example, > suppose the user wants to defer replication by 1hr. IMO it is more natural to > just say "defer replication by 1 hr" (aka > apply_delay='1hr') Clearly it means replication will take place about > 1 hr into the future. OTHO saying "defer replication by a MINIMUM of 1 hr" > (aka > min_apply_delay='1hr') is quite vague because then it is equally valid if the > replication gets delayed by 1 hr or 2 hrs or 5 days or 3 weeks since all of > those > satisfy the minimum delay. The implementation could hardwire a delay of > INT_MAX ms but clearly, that's not really what the user would expect. > > ~ > > So, I think this parameter should be renamed just as 'apply_delay'. > > But, if you still decide to keep it as 'min_apply_delay' then there is a lot > of other > code that ought to be changed to be consistent with that name. > e.g. > - subapplydelay in catalogs.sgml --> subminapplydelay > - subapplydelay in system_views.sql --> subminapplydelay > - subapplydelay in pg_subscription.h --> subminapplydelay > - subapplydelay in dump.h --> subminapplydelay > - i_subapplydelay in pg_dump.c --> i_subminapplydelay > - applydelay member name of Form_pg_subscription --> minapplydelay > - "Apply Delay" for the column name displayed by describe.c --> "Min apply > delay" I followed the suggestion to keep the "min_" prefix in [1]. Fixed. > - more... > > (IMO the fact that so much code does not currently say 'min' at all is just > evidence that the 'min' prefix really didn't really mean much in the first > place) > > > ====== > > doc/src/sgml/catalogs.sgml > > 2. Section 31.2 Subscription > > + <para> > + Time delayed replica of subscription is available by indicating > + <literal>min_apply_delay</literal>. See > + <xref linkend="sql-createsubscription"/> for details. > + </para> > > How about saying like: > > SUGGESTION > The subscriber replication can be instructed to lag behind the publisher side > changes by specifying the <literal>min_apply_delay</literal> subscription > parameter. See XXX for details. Fixed. > ====== > > doc/src/sgml/ref/create_subscription.sgml > > 3. min_apply_delay > > + <para> > + By default, subscriber applies changes as soon as possible. As with > + the physical replication feature > + (<xref linkend="guc-recovery-min-apply-delay"/>), it can be useful > to > + have a time-delayed logical replica. This parameter allows you to > + delay the application of changes by a specified amount of time. If > + this value is specified without units, it is taken as milliseconds. > + The default is zero, adding no delay. > + </para> > > "subscriber applies" -> "the subscriber applies" > > "allows you" -> "lets the user" > > "The default is zero, adding no delay." -> "The default is zero (no delay)." Fixed. > ~ > > 4. > > + larger than the time deviations between servers. Note that > + in the case when this parameter is set to a long value, the > + replication may not continue if the replication slot falls behind > the > + current LSN by more than > <literal>max_slot_wal_keep_size</literal>. > + See more details in <xref linkend="guc-max-slot-wal-keep-size"/>. > + </para> > > 4a. > SUGGESTION > Note that if this parameter is set to a long delay, the replication will stop > if the > replication slot falls behind the current LSN by more than > <literal>max_slot_wal_keep_size</literal>. Fixed. > ~ > > 4b. > When it is rendered (like below) it looks a bit repetitive: > ... if the replication slot falls behind the current LSN by more than > max_slot_wal_keep_size. See more details in max_slot_wal_keep_size. Thanks! Fixed the redundancy. > ~ > > IMO the previous sentence should include the link. > > SUGGESTION > if the replication slot falls behind the current LSN by more than <link > linkend = > "guc-max-slot-wal-keep-size"><literal>max_slot_wal_keep_size</literal></lin > k>. Fixed. > ~ > > 5. > > + <para> > + Synchronous replication is affected by this setting when > + <varname>synchronous_commit</varname> is set to > + <literal>remote_write</literal>; every <literal>COMMIT</literal> > + will need to wait to be applied. > + </para> > > Yes, this deserves a big warning -- but I am just not quite sure of the > details. I > think this impacts more than just "remote_rewrite" -- e.g. the same problem > would happen if "synchronous_standby_names" is non-empty. > > I think this warning needs to be more generic to cover everything. > Maybe something like below > > SUGGESTION: > Delaying the replication can mean there is a much longer time between making > a change on the publisher, and that change being committed on the subscriber. > This can have a big impact on synchronous replication. > See > https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-SYN > CHRONOUS-COMMIT Fixed. > > ====== > > src/backend/commands/subscriptioncmds.c > > 6. parse_subscription_options > > + ms = interval_to_ms(interval); > + if (ms < 0 || ms > INT_MAX) > + ereport(ERROR, > + errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), > + errmsg("%lld ms is outside the valid range for option \"%s\"", > + (long long) ms, "min_apply_delay")); > > "for option" -> "for parameter" Fixed. > ====== > > src/backend/replication/logical/worker.c > > 7. apply_delay > > +static void > +apply_delay(TimestampTz ts) > > IMO having a delay is not the usual case. So, would a better name for this > function be 'maybe_delay'? Makes sense. I follow some other functions such as maybe_reread_subscription and maybe_start_skipping_changes. > ~ > > 8. > > + * high value for the delay. This design is different from the physical > + * replication (that applies the delay at commit time) mainly because > + write > + * operations may allow some issues (such as bloat and locks) that can > + be > + * minimized if it does not keep the transaction open for such a long time. > > Something seems not quite right with this wording -- is there a better way of > describing this? I reworded the entire paragraph. Could you please check ? > ~ > > 9. > > + /* > + * Delay apply until all tablesync workers have reached READY state. If > + we > + * allow the delay during the catchup phase, once we reach the limit of > + * tablesync workers, it will impose a delay for each subsequent worker. > + * It means it will take a long time to finish the initial table > + * synchronization. > + */ > + if (!AllTablesyncsReady()) > + return; > > "Delay apply until..." -> "The min_apply_delay parameter is ignored until..." Fixed. > ~ > > 10. > > + /* > + * The worker may be waken because of the ALTER SUBSCRIPTION ... > + * DISABLE, so the catalog pg_subscription should be read again. > + */ > + if (!in_remote_transaction && !in_streamed_transaction) { > + AcceptInvalidationMessages(); maybe_reread_subscription(); } } > > "waken" -> "woken" I have removed this sentence for a new change to recalculate the diffms for any updates of the "min_apply_delay" parameter. Please have a look at maybe_delay_apply(). > ====== > > src/bin/psql/describe.c > > 11. describeSubscriptions > > + /* Origin and min_apply_delay are only supported in v16 and higher */ > if (pset.sversion >= 160000) > appendPQExpBuffer(&buf, > - ", suborigin AS \"%s\"\n", > - gettext_noop("Origin")); > + ", suborigin AS \"%s\"\n" > + ", subapplydelay AS \"%s\"\n", > + gettext_noop("Origin"), > + gettext_noop("Apply delay")); > > IIUC the psql command is supposed to display useful information to the user, > so > I wondered if it is worthwhile to put the units in this column header -- > "Apply > delay (ms)" instead of just "Apply delay" > because that would make it far easier to understand the meaning without > having to check the documentation to discover the units. Fixed. > ====== > > src/include/utils/timestamp.h > > 12. > > +extern int64 interval_to_ms(const Interval *interval); > + > > For consistency with the other interval conversion functions exposed here > maybe this one should have been called 'interval2ms' Fixed. > ====== > > src/test/subscription/t/032_apply_delay.pl > > 13. > > IIUC this test is checking if a delay has occurred by inspecting the debug > logs to > see if a certain code path including "logical replication apply delay" is > logged. I > guess that is OK, but another way might be to compare the actual timing values > of the published and replicated rows. > > The publisher table can have a column with default now() and the subscriber > side table can have an *additional* column also with default now(). After > replication, those two timestamp values can be compared to check if the > difference exceeds the min_time_delay parameter specified. Added this check. This patch now depends on a patch posted in another thread in [2] for TAP test of "min_apply_delay" feature. Without this patch, if one backend process executes ALTER SUBSCRIPTION SET min_apply_delay, while the apply worker gets another message for apply_dispatch, the apply worker doesn't notice the reset and utilizes the old value for that incoming transaction. To fix this, I posted the patch together. (During the patch creation, I don't any change any code logs of the wakeup patch, but for my env, I adjusted the line feed.) Kindly have a look at the updated patch. [1] - https://www.postgresql.org/message-id/CAA4eK1J9HEL-U32FwkHXLOGXPV_Fu%2Bnb%2B1KpV7hTbnqbBNnDUQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/20221122004119.GA132961@nathanxps13 Best Regards, Takamichi Osumi
v10-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch
Description: v10-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch
v10-0002-Time-delayed-logical-replication-subscriber.patch
Description: v10-0002-Time-delayed-logical-replication-subscriber.patch