typedef struct LogicalDecodingContext
Hi, During a recent code review, I noticed a lot of 'struct LogicalDecodingContext' usage. There are many function prototypes where the params are (for no apparent reason to me) a mixture of structs and typedef structs. AFAICT just by pre-declaring the typedef struct LogicalDecodingContext, all of those 'struct LogicalDecodingContext' can be culled, resulting in cleaner and more consistent function signatures. The PG Docs were similarly modified. PSA patch for this. It passes make check-world. (I recognize this is potentially the tip of an iceberg. If this patch is deemed OK, I can hunt down similar underuse of typedefs for other structs) Thoughts? -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-Use-typedef-struct-LogicalDecodingContext.patch Description: Binary data
Re: Time delayed LR (WAS Re: logical replication restrictions)
Here are some review comments for v9-0001, but these are only very trivial. == Commit Message 1. Nitpick. The new text is jagged-looking. It should wrap at ~80 chars. ~~~ 2. 2. Another reason is for that parallel streaming, the transaction will be opened immediately by the parallel apply worker. Therefore, if the walsender is delayed in sending the final record of the transaction, the parallel apply worker must wait to receive it with an open transaction. This would result in the locks acquired during the transaction not being released until the min_send_delay has elapsed. ~ The text already said there are "two reasons", and already this is numbered as reason 2. So it doesn't need to keep saying "Another reason" here. "Another reason is for that parallel streaming" --> "For parallel streaming..." == src/backend/replication/walsender.c 3. WalSndDelay + /* die if timeout was reached */ + WalSndCheckTimeOut(); Other nearby comments start uppercase, so this should too. == src/include/replication/walreceiver.h 4. WalRcvStreamOptions @@ -187,6 +187,7 @@ typedef struct * prepare time */ char*origin; /* Only publish data originating from the * specified origin */ + int32 min_send_delay; /* The minimum send delay */ } logical; } proto; } WalRcvStreamOptions; ~ Should that comment mention the units are "(ms)" -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Rework LogicalOutputPluginWriterUpdateProgress
(!ctx->accept_writes) - elog(ERROR, "writes are only accepted in commit, begin and change callbacks"); + elog(ERROR, "writes are only accepted in callbacks in the OutputPluginCallbacks structure (except startup, shutdown, filter_by_origin and filter_prepare callbacks)"); It seems a confusing error message. Can it be worded better? Also, I noticed this flag is never used except in this one place where it throws an error, so would an "Assert" would be more appropriate here? ~~~ 8. rollback_prepared_cb_wrapper /* * If the plugin support two-phase commits then rollback prepared callback * is mandatory + * + * FIXME: This should have been caught much earlier. */ if (ctx->callbacks.rollback_prepared_cb == NULL) ~ Is this FIXME related to the current patch, or should this be an entirely different topic? ~~~ 9. is_skip_threshold_change The current usage for this function is like: if (is_skip_threshold_change(ctx)) + update_progress_and_keepalive(ctx, false); ~ IMO a better name for this function might be like 'is_change_threshold_exceeded()' (or 'is_keepalive_threshold_exceeded()' etc) because seems more readable to say if (is_change_threshold_exceeded()) do_something(); ~~~ 10. is_skip_threshold_change static bool is_skip_threshold_change(struct LogicalDecodingContext *ctx) { static int changes_count = 0; /* used to accumulate the number of * changes */ /* If the change was published, reset the counter and return false */ if (ctx->did_write) { changes_count = 0; return false; } /* * It is possible that the data is not sent to downstream for a long time * either because the output plugin filtered it or there is a DDL that * generates a lot of data that is not processed by the plugin. So, in * such cases, the downstream can timeout. To avoid that we try to send a * keepalive message if required. Trying to send a keepalive message * after every change has some overhead, but testing showed there is no * noticeable overhead if we do it after every ~100 changes. */ #define CHANGES_THRESHOLD 100 if (!ctx->did_write && ++changes_count >= CHANGES_THRESHOLD) { changes_count = 0; return true; } return false; } ~ That 2nd condition checking if (!ctx->did_write && ++changes_count >= CHANGES_THRESHOLD) does not seem right. There is no need to check the ctx->did_write; it must be false because it was checked earlier in the function: BEFORE if (!ctx->did_write && ++changes_count >= CHANGES_THRESHOLD) SUGGESTION1 Assert(!ctx->did_write); if (++changes_count >= CHANGES_THRESHOLD) SUGGESTION2 if (++changes_count >= CHANGES_THRESHOLD) ~~~ 11. update_progress_and_keepalive /* * Update progress tracking and send keep alive (if required). */ static void update_progress_and_keepalive(struct LogicalDecodingContext *ctx, bool finished_xact) { if (!ctx->update_progress_and_keepalive) return; ctx->update_progress_and_keepalive(ctx, ctx->write_location, ctx->write_xid, ctx->did_write, finished_xact); } ~ Maybe it's simpler to code this without the return. e.g. if (ctx->update_progress_and_keepalive) { ctx->update_progress_and_keepalive(ctx, ctx->write_location, ctx->write_xid, ctx->did_write, finished_xact); } (it is just generic suggested code for example -- I made some other review comments overlapping this) == .../replication/logical/reorderbuffer.c 12. ReorderBufferAbort + UpdateDecodingProgressAndKeepalive((LogicalDecodingContext *)rb->private_data, +xid, lsn, !TransactionIdIsValid(txn->toplevel_xid)); + I didn't really recognise how the "!TransactionIdIsValid(txn->toplevel_xid)" maps to the boolean 'finished_xact' param. Can this call have an explanatory comment about how it works? == src/backend/replication/walsender.c ~~~ 13. WalSndUpdateProgressAndKeepalive - if (pending_writes || (!end_xact && + if (pending_writes || (!finished_xact && wal_sender_timeout > 0 && now >= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2))) - ProcessPendingWrites(); + WalSndSendPending(); Is this new function name OK to be WalSndSendPending? From this code, we can see it can also be called in other scenarios even when there is nothing "pending" at all. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: PGDOCS - sgml linkend using single-quotes
On Mon, Feb 27, 2023 at 7:04 PM Heikki Linnakangas wrote: > ... > > There were also a few "id" attributes using single-quotes. Fixed those > too, and pushed. Thanks! > Thankyou for pushing. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
d to enhance the "binary" mode so it can be made to fall back to text mode if it needs to in the same way that binary replication does. If such an enhanced COPY format mode worked, then most of the patch is redundant - there is no need for any new option - tablesync COPY could then *always* use this "binary-with-falback" mode -- [1] Melih initially wanted a unified binary mode - https://www.postgresql.org/message-id/CAGPVpCQYi9AYQSS%3DRmGgVNjz5ZEnLB8mACwd9aioVhLmbgiAMA%40mail.gmail.com [2] Barath doesn't like the binary/copy_format inter-dependency - https://www.postgresql.org/message-id/CALj2ACVPt-BaLMm3Ezy1-rfUzH9qStxePcyGrHPamPESEZSBFA%40mail.gmail.com [3] Shi-san found binary mode has the ability to fall back to text sometimes - https://www.postgresql.org/message-id/OSZPR01MB6310B58F069FF8E148B247FDFDA49%40OSZPR01MB6310.jpnprd01.prod.outlook.com [4] Jelte idea to enhance the copy_data option - https://www.postgresql.org/message-id/CAGECzQS393xiME%2BEySLU7ceO4xOB81kPjPqNBjaeW3sLgfLhNw%40mail.gmail.com [5] Kuroda-san etc expecting copy_data=false/copy_format=binary combination is not allowed - https://www.postgresql.org/message-id/TYAPR01MB5866DDF02B3CEE59DA024CC3F5AB9%40TYAPR01MB5866.jpnprd01.prod.outlook.com [6] Jelte says it is the operator's responsibility to use the correct options - https://www.postgresql.org/message-id/CAGECzQSStdb%2Bx1BxzCktZd1uSjvd6eYq1wcHV3vPCykrGqxYKQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
The patch v19 LGTM. - v19 applies cleanly for me - Full clean build OK - HTML docs build and render OK - The 'make check' tests all pass for me - Also cfbot reports latest patch has no errors -- http://cfbot.cputube.org/ So, I marked it a "Ready for Committer" in the CF -- https://commitfest.postgresql.org/42/4162/ -- Kind Regards, Peter Smith. Fujitsu Australia
Re: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes
On Thu, Feb 23, 2023 at 11:28 AM Michael Paquier wrote: > > On Wed, Feb 22, 2023 at 10:21:51AM +, shiy.f...@fujitsu.com wrote: > > Thanks for your reply. Using two flags makes sense to me. > > Attach the updated patch. > > Fine by me as far as it goes. Any thoughts from others? > -- Should the 'relation_callback_registered' variable name be plural? Otherwise, LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
Here are my review comments for patch v17-0001. == src/test/regress/sql/xml.sql The blank line(s) which previously separated the xmlserialize tests from the xml IS [NOT] DOCUMENT tests are now missing... e.g. -- indent different encoding (returns UTF-8) SELECT xmlserialize(DOCUMENT '' AS text INDENT); SELECT xmlserialize(CONTENT '' AS text INDENT); -- 'no indent' = not using 'no indent' SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT '42' AS text NO INDENT); SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT '42' AS text NO INDENT); SELECT xml 'bar' IS DOCUMENT; SELECT xml 'barfoo' IS DOCUMENT; SELECT xml '' IS NOT DOCUMENT; SELECT xml 'abc' IS NOT DOCUMENT; SELECT '<>' IS NOT DOCUMENT; ~~ Apart from that, patch v17 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
Patch v6 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
Here are some review comments for patch v16-0001. == > src/backend/executor/execExprInterp.c > > 2. ExecEvalXmlExpr > > @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) >{ >Datum*argvalue = op->d.xmlexpr.argvalue; >bool*argnull = op->d.xmlexpr.argnull; > - > + boolindent = op->d.xmlexpr.xexpr->indent; > + text*data; >/* argument type is known to be xml */ >Assert(list_length(xexpr->args) == 1); > Missing whitespace after the variable declarations Whitespace added. ~ Oh, I meant something different to that fix. I meant there is a missing blank line after the last ('data') variable declaration. == Test code. I wondered if there ought to be a test that demonstrates explicitly saying NO INDENT will give the identical result to just omitting it. For example: test=# -- no indent is default test=# SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT '42' AS text NO INDENT); ?column? -- t (1 row) test=# SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT '42' AS text NO INDENT); ?column? -- t (1 row) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
Here are some review comments for patch v15-0001 FYI, the patch applies clean and tests OK for me. == doc/src/sgml/datatype.sgml 1. XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ { NO INDENT | INDENT } ] ) ~ Another/shorter way to write that syntax is like below. For me, it is easier to read. YMMV. XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] ) == src/backend/executor/execExprInterp.c 2. ExecEvalXmlExpr @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum*argvalue = op->d.xmlexpr.argvalue; bool*argnull = op->d.xmlexpr.argnull; - + boolindent = op->d.xmlexpr.xexpr->indent; + text*data; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); Missing whitespace after the variable declarations ~~~ 3. + + data = xmltotext_with_xmloption(DatumGetXmlP(value), + xexpr->xmloption); + if(indent) + *op->resvalue = PointerGetDatum(xmlformat(data)); + else + *op->resvalue = PointerGetDatum(data); + } Unnecessary blank line at the end. == src/backend/utils/adt/xml. 4. xmlformat +#else + NO_XML_SUPPORT(); +return 0; +#endif Wrong indentation (return 0) in the indentation function? ;-) ------ Kind Regards, Peter Smith. Fujitsu Australia
Re: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes
On Wed, Feb 22, 2023 at 12:03 PM Kyotaro Horiguchi wrote: > > At Tue, 21 Feb 2023 10:31:29 +, "shiy.f...@fujitsu.com" > wrote in > > Thanks for your reply. I agree that's expensive. Attach a new patch which > > adds a > > static boolean to avoid duplicate registration. > > Thank you for the patch. It is exactly what I had in my mind. But now > that I've had a chance to mull it over, I came to think it might be > better to register the callbacks at one place. I'm thinking we could > create a new function called register_callbacks() or something and > move all the calls to CacheRegisterSyscacheCallback() into that. What > do you think about that refactoring? > > I guess you could say that that refactoring somewhat weakens the > connection or dependency between init_rel_sync_cache and > rel_sync_cache_relation_cb, but anyway the callback works even if > RelationSyncCache is not around. > If you are going to do that, then won't just copying the CacheRegisterSyscacheCallback(PUBLICATIONOID... into function init_rel_sync_cache() be effectively the same as doing that? Then almost nothing else to do...e.g. no need for a new extra static boolean if static RelationSyncCache is acting as the one-time guard anyway. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
Here are some very minor review comments for the patch v4-0001 == Commit Message 1. The other possibility was to apply the delay at the end of the parallel apply transaction but that would cause issues related to resource bloat and locks being held for a long time. ~ The reply [1] for review comment #2 says that this was "slightly reworded", but AFAICT nothing is changed here. ~~~ 2. Eariler versions were written by Euler Taveira, Takamichi Osumi, and Kuroda Hayato Typo: "Eariler" == doc/src/sgml/ref/create_subscription.sgml 3. + + By default, the publisher sends changes as soon as possible. This + parameter allows the user to delay changes by given time period. If + the value is specified without units, it is taken as milliseconds. + The default is zero (no delay). See + for details on the available valid time units. + "by given time period" --> "by the given time period" == src/backend/replication/pgoutput/pgoutput.c 4. parse_output_parameters + else if (strcmp(defel->defname, "min_send_delay") == 0) + { + unsigned long parsed; + char*endptr; I think 'parsed' is a fairly meaningless variable name. How about calling this variable something useful like 'delay_val' or 'min_send_delay_value', or something like those? Yes, I recognize that you copied this from some existing code fragment, but IMO that doesn't make it good. == src/backend/replication/walsender.c 5. + /* Sleep until we get reply from worker or we time out */ + WalSndWait(WL_SOCKET_READABLE, +Min(timeout_sleeptime_ms, remaining_wait_time_ms), +WAIT_EVENT_WALSENDER_SEND_DELAY); In my previous review [2] comment #14, I questioned if this comment was correct. It looks like that was accidentally missed. == src/include/replication/logical.h 6. + /* + * The minimum delay, in milliseconds, by the publisher before sending + * COMMIT/PREPARE record + */ + int32 min_send_delay; The comment is missing a period. -- [1] Kuroda-san replied to my review v3-0001. https://www.postgresql.org/message-id/TYAPR01MB5866C6BCA4D9386D9C486033F5A59%40TYAPR01MB5866.jpnprd01.prod.outlook.com [2] My previous review v3-0001. https://www.postgresql.org/message-id/CAHut%2BPu6Y%2BBkYKg6MYGi2zGnx6imeK4QzxBVhpQoPMeDr1npnQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: [Proposal] Add foreign-server health checks infrastructure
Here are some review comments for v32-0001. == Commit message 1. PQconnCheck() function allows to check the status of the socket by polling the socket. This function is currently available only on systems that support the non-standard POLLRDHUP extension to the poll system call, including Linux. ~ (missed fix from previous review) "status of the socket" --> "status of the connection" doc/src/sgml/libpq.sgml 2. PQconnCheck + + This function check the health of the connection. Unlike , + this check is performed by polling the corresponding socket. This + function is currently available only on systems that support the + non-standard POLLRDHUP extension to the poll + system call, including Linux. + returns greater than zero if the remote peer seems to be closed, returns + 0 if the socket is valid, and returns -1 + if the connection has already been closed or an error has occurred. + "check the health" --> "checks the health" ~~~ 3. PQcanConnCheck + + Returns true (1) or false (0) to indicate if the PQconnCheck function + is supported on this platform. Should the reference to PQconnCheck be a link as it previously was? == src/interfaces/libpq/fe-misc.c 4. PQconnCheck +/* + * Check whether the socket peer closed connection or not. + * + * Returns >0 if remote peer seems to be closed, 0 if it is valid, + * -1 if the input connection is bad or an error occurred. + */ +int +PQconnCheck(PGconn *conn) +{ + return pqSocketCheck(conn, 0, 0, (time_t) 0); +} I'm confused. This comment says =0 means connection is valid. But the pqSocketCheck comment says =0 means it timed out. So those two function comments don't seem compatible ~~~ 5. PQconnCheckable +/* + * Check whether PQconnCheck() works well on this platform. + * + * Returns true (1) if this can use PQconnCheck(), otherwise false (0). + */ +int +PQconnCheckable(void) +{ +#if (defined(HAVE_POLL) && defined(POLLRDHUP)) + return true; +#else + return false; +#endif +} Why say "works well"? IMO it either works or doesn't work – there is no "well". SUGGESTION1 Check whether PQconnCheck() works on this platform. SUGGESTION2 Check whether PQconnCheck() can work on this platform. ~~~ 6. pqSocketCheck /* * Checks a socket, using poll or select, for data to be read, written, - * or both. Returns >0 if one or more conditions are met, 0 if it timed + * or both. Moreover, when neither forRead nor forWrite is requested and + * timeout is disabled, try to check the health of socket. + * + * Returns >0 if one or more conditions are met, 0 if it timed * out, -1 if an error occurred. * * If SSL is in use, the SSL buffer is checked prior to checking the socket ~ See review comment #4. (e.g. This says =0 if it timed out). ~~~ 7. pqSocketPoll + * When neither forRead nor forWrite are set and timeout is disabled, + * + * - If the timeout is disabled, try to check the health of the socket + * - Otherwise this immediately returns 0 + * + * Return >0 if condition is met, 0 if a timeout occurred, -1 if an error + * or interrupt occurred. Don't say "and timeout is disabled," because it clashes with the 1st bullet which also says "- If the timeout is disabled,". -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
Here are some review comments for the v3-0001 test code. == src/test/regress/sql/subscription.sql 1. +-- fail - utilizing streaming = parallel with time-delayed replication is not supported +CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = parallel, min_send_delay = 123); "utilizing" --> "specifying" ~~~ 2. +-- success -- min_send_delay value without unit is take as milliseconds +CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexit' PUBLICATION testpub WITH (connect = false, min_send_delay = 123); +\dRs+ "without unit is take as" --> "without units is taken as" ~~~ 3. +-- success -- min_send_delay value with unit is converted into ms and stored as an integer +ALTER SUBSCRIPTION regress_testsub SET (min_send_delay = '1 d'); +\dRs+ "with unit is converted into ms" --> "with units other than ms is converted to ms" ~~~ 4. Missing tests? Why have the previous ALTER SUBSCRIPTION tests been removed? AFAIK, currently, there are no regression tests for error messages like: test_sub=# ALTER SUBSCRIPTION sub1 SET (min_send_delay = 123); ERROR: cannot set min_send_delay for subscription in parallel streaming mode == src/test/subscription/t/001_rep_changes.pl 5. +# This test is successful if and only if the LSN has been applied with at least +# the configured apply delay. +ok( time() - $publisher_insert_time >= $delay, + "subscriber applies WAL only after replication delay for non-streaming transaction" +); It's not strictly an "apply delay". Maybe this comment only needs to say like below: SUGGESTION # This test is successful only if at least the configured delay has elapsed. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
errno = 0; + parsed = strtoul(strVal(defel->arg), , 10); + if (errno != 0 || *endptr != '\0') + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid min_send_delay"))); + + if (parsed > PG_INT32_MAX) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("min_send_delay \"%s\" out of range", + strVal(defel->arg; Should the validation be also checking/asserting no negative numbers, or actually should the min_send_delay be defined as a uint32 in the first place? ~~~ 12. pgoutput_startup @@ -501,6 +528,15 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, else ctx->twophase_opt_given = true; + if (data->min_send_delay && + data->protocol_version < LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("requested proto_version=%d does not support delay sending data, need %d or higher", + data->protocol_version, LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM))); + else + ctx->min_send_delay = data->min_send_delay; IMO it doesn't make sense to compare this new feature with the unrelated LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM protocol version. I think we should define a new constant LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM (even if it has the same value as the LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM). == src/backend/replication/walsender.c 13. WalSndDelay + long diffms; + long timeout_interval_ms; IMO some more informative name for these would make the code read better: 'diffms' --> 'remaining_wait_time_ms' 'timeout_interval_ms' --> 'timeout_sleeptime_ms' ~~~ 14. + /* Sleep until we get reply from worker or we time out */ + WalSndWait(WL_SOCKET_READABLE, +Min(timeout_interval_ms, diffms), +WAIT_EVENT_WALSENDER_SEND_DELAY); Sorry, I didn't understand this comment "reply from worker"... AFAIK here we are just sleeping, not waiting for replies from anywhere (???) == src/include/replication/logical.h 15. @@ -64,6 +68,7 @@ typedef struct LogicalDecodingContext LogicalOutputPluginWriterPrepareWrite prepare_write; LogicalOutputPluginWriterWrite write; LogicalOutputPluginWriterUpdateProgress update_progress; + LogicalOutputPluginWriterDelay delay; ~ 15a. Question: Is there some advantage to introducing another callback, instead of just doing the delay inline? ~ 15b. Should this be a more informative member name like 'delay_send'? ~~~ 16. @@ -100,6 +105,8 @@ typedef struct LogicalDecodingContext */ bool twophase_opt_given; + int32 min_send_delay; + Missing comment for this new member. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [Proposal] Add foreign-server health checks infrastructure
Here is a code review only for patch v31-0001. == General Comment 1. PQcanConnCheck seemed like a strange API name. Maybe it can have the same prefix as the other? e.g. - PQconnCheck() - PGconnCheckSupported() or - PQconnCheck() - PGconnCheckable() == Commit Message 2. PqconnCheck() function allows to check the status of socket by polling the socket. This function is currently available only on systems that support the non-standard POLLRDHUP extension to the poll system call, including Linux. PqcanConnCheck() checks whether above function is available or not. ~ 2a. "status of socket" --> "status of the connection" ~ 2b. "above function" --> "the above function" == doc/src/sgml/libpq.sgml 3. PQconnCheck Returns the health of the socket. int PQconnCheck(PGconn *conn); Unlike PQstatus, this function checks socket health. This check is performed by polling the socket. This function is currently available only on systems that support the non-standard POLLRDHUP extension to the poll system call, including Linux. PQconnCheck returns greater than zero if the remote peer seems to be closed, returns 0 if the socket is valid, and returns -1 if the connection has been already invalid or an error is error occurred. ~ 3a. Should these descriptions be referring to the health of the *connection* rather than the health of the socket? ~ 3b. "has been already invalid" ?? wording ~~~ 4. PQcanConnCheck Returns whether PQconnCheck is available on this platform. PQcanConnCheck returns 1 if the function is supported, otherwise returns 0. ~ I thought this should be worded using "true" and "false" same as other boolean functions on this page. SUGGESTION Returns true (1) or false (0) to indicate if the PQconnCheck function is supported on this platform. == src/interfaces/libpq/fe-misc.c 5. -static int pqSocketCheck(PGconn *conn, int forRead, int forWrite, - time_t end_time); +static int pqSocketIsReadableOrWritableOrValid(PGconn *conn, int forRead, + int forWrite, time_t end_time); I was not 100% sure overloading this API is the right thing to do. Doesn't this introduce a subtle side-effect on some of the existing callers? e.g. Previously pqWaitTimed would ALWAYS return 0 if forRead/forWrite were both false. But now other return values like errors will be possible. Is that OK? ~~~ 6. pqSocketPoll /* * Check a file descriptor for read and/or write data, possibly waiting. * If neither forRead nor forWrite are set, immediately return a timeout * condition (without waiting). Return >0 if condition is met, 0 * if a timeout occurred, -1 if an error or interrupt occurred. * * Timeout is infinite if end_time is -1. Timeout is immediate (no blocking) * if end_time is 0 (or indeed, any time before now). * * Moreover, when neither forRead nor forWrite is requested and timeout is * disabled, try to check the health of socket. */ The new comment "Moreover..." is contrary to the earlier part of the same comment which already said, "If neither forRead nor forWrite are set, immediately return a timeout condition (without waiting)." There might be side-effects to previous/existing callers of this function (e.g. pqWaitTimed via pqSocketCheck) ~~~ 7. if (!forRead && !forWrite) - return 0; + { + /* Try to check the health if requested */ + if (end_time == 0) +#if defined(POLLRDHUP) + input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL; +#else + return 0; +#endif /* defined(POLLRDHUP) */ + else + return 0; + } FYI - I think the new code can be simpler without needing #else by calling your other new function. SUGGESTION if (!forRead && !forWrite) { if (!PQcanConnCheck() || end_time != 0) return 0; /* Check the connection health when end_time is 0 */ Assert(PQcanConnCheck() && end_time == 0); #if defined(POLLRDHUP) input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL; #endif } ~~~ 8. PQconnCheck +/* + * Check whether PQconnCheck() can work well on this platform. + * + * Returns 1 if this can use PQconnCheck(), otherwise 0. + */ +int +PQcanConnCheck(void) +{ +#if (defined(HAVE_POLL) && defined(POLLRDHUP)) + return true; +#else + return false; +#endif +} ~ 8a. "can work well" --> "works" ~ 8b. Maybe better to say "true (1)" and "otherwise false (0)" -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Fri, Feb 17, 2023 at 5:57 PM Peter Smith wrote: > > FYI, I accidentally left this (v23) patch's TAP test > t/032_subscribe_use_index.pl still lurking even after removing all > other parts of this patch. > > In this scenario, the t/032 test gets stuck (build of latest HEAD) > > IIUC the patch is only meant to affect performance, so I expected this > 032 test to work regardless of whether the rest of the patch is > applied. > > Anyway, it hangs every time for me. I didn't dig looking for the > cause, but if it requires patched code for this new test to pass, I > thought it indicates something wrong either with the test or something > more sinister the new test has exposed. Maybe I am mistaken > Sorry, probably the above was a false alarm. After a long time (minutes) the stuck test did eventually timeout with: t/032_subscribe_use_index.pl ... # poll_query_until timed out executing this query: # select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx'; # expecting this output: # t # last actual query output: # f # with stderr: t/032_subscribe_use_index.pl ... Dubious, test returned 29 (wstat 7424, 0x1d00) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
FYI, I accidentally left this (v23) patch's TAP test t/032_subscribe_use_index.pl still lurking even after removing all other parts of this patch. In this scenario, the t/032 test gets stuck (build of latest HEAD) IIUC the patch is only meant to affect performance, so I expected this 032 test to work regardless of whether the rest of the patch is applied. Anyway, it hangs every time for me. I didn't dig looking for the cause, but if it requires patched code for this new test to pass, I thought it indicates something wrong either with the test or something more sinister the new test has exposed. Maybe I am mistaken -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
LGTM. My only comment is about the commit message. == Commit message d9d7fe6 reuse existing wait event when sending data in apply worker. But we should have invent a new wait state if we are waiting at a new place, so fix this. ~ SUGGESTION d9d7fe6 made use of an existing wait event when sending data from the apply worker, but we should have invented a new wait state since the code was waiting at a new place. This patch corrects the mistake by using a new wait state "LogicalApplySendData". -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
On Thu, Feb 16, 2023 at 12:49 AM Jim Jones wrote: > > Accidentally left the VERBOSE settings out -- sorry! > > Now it matches the approach used in a xpath test in xml.sql, xml.out, > xml_1.out and xml_2.out > > -- Since different libxml versions emit slightly different > -- error messages, we suppress the DETAIL in this test. > \set VERBOSITY terse > SELECT xpath('/*', ''); > ERROR: could not parse XML document > \set VERBOSITY default > > v11 now correctly sets xml_2.out. > > Best, Jim Firstly, Sorry it seems like I made a mistake and was premature calling bingo above for v9. - today I repeated v9 'make check' and found it failing still. - the new xmlformat tests are OK, but some pre-existing xmlparse tests are broken. - see attached file pretty-v9-results OTOH, the absence of xml_2.out from this patch appears to be the correct explanation for why my results have been differing. Today I fetched and tried the latest v11. It is failing too, but only just. - see attached file pretty-v11-results It looks only due to a whitespace EOF issue in the xml_2.out @@ -1679,4 +1679,4 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -\set VERBOSITY default \ No newline at end of file +\set VERBOSITY default -- The attached patch update (v12-0002) fixes the xml_2.out for me. -- Kind Regards, Peter Smith. Fujitsu Australia v12-0001-Add-pretty-printed-XML-output-option.patch Description: Binary data v12-0002-PS-fix-EOF-for-xml_2.out.patch Description: Binary data pretty-v9-results Description: Binary data pretty-v11-results Description: Binary data
Re: [PATCH] Add pretty-printed XML output option
On Wed, Feb 15, 2023 at 6:10 PM Jim Jones wrote: > > On 15.02.23 02:09, Peter Smith wrote: > > With v8, in my environment, in psql I see something slightly different: > > > > > > test_pub=# SET XML OPTION CONTENT; > > SET > > test_pub=# SELECT xmlformat(''); > > ERROR: invalid XML document > > DETAIL: line 1: switching encoding : no input > > line 1: Document is empty > > test_pub=# SET XML OPTION DOCUMENT; > > SET > > test_pub=# SELECT xmlformat(''); > > ERROR: invalid XML document > > LINE 1: SELECT xmlformat(''); > > ^ > > DETAIL: line 1: switching encoding : no input > > line 1: Document is empty > > > > ~~ > > > > test_pub=# SET XML OPTION CONTENT; > > SET > > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR: relation "xmltest" does not exist > > LINE 1: INSERT INTO xmltest VALUES (3, ' > ^ > > test_pub=# SET XML OPTION DOCUMENT; > > SET > > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR: relation "xmltest" does not exist > > LINE 1: INSERT INTO xmltest VALUES (3, ' > ^ > > > > ~~ > > Yes... a cfbot also complained about the same thing. > > Setting the VERBOSITY to terse might solve this issue: > > postgres=# \set VERBOSITY terse > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > > postgres=# \set VERBOSITY default > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > DETAIL: line 1: switching encoding : no input > > ^ > line 1: Document is empty > > ^ > > v9 wraps the corner test cases with VERBOSITY terse to reduce the error > message output. > Bingo!! Your v9 patch now passes all 'make check' tests for me. But I'll leave it to a committer to decide if this VERBOSITY toggle is the best fix. (I don't understand, maybe someone can explain, how the patch managed to mess verbosity of the existing tests.) -- Kind Regards, Peter Smith. Fujitsu Austalia.
Re: Support logical replication of DDLs
On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian wrote: > > On Fri, Feb 3, 2023 at 11:41 AM Peter Smith wrote: > > > > Here are some review comments for patch v63-0001. > > > > == > > src/backend/catalog/aclchk.c > > > > 3. ExecuteGrantStmt > > > > + /* Copy the grantor id needed for DDL deparsing of Grant */ > > + istmt.grantor_uid = grantor; > > + > > > > SUGGESTION (comment) > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant > > > > didn't change this, as Alvaro said this was not a parsetree. Perhaps there is more to do here? Sorry, I did not understand the details of Alvaro's post [1], but I did not recognize the difference between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression either one or both of these places are either wrongly commented, or maybe are doing something that should not be done. > > == > > src/backend/utils/adt/regproc.c > > > > 9. > > + > > +/* > > + * Append the parenthesized arguments of the given pg_proc row into the > > output > > + * buffer. force_qualify indicates whether to schema-qualify type names > > + * regardless of visibility. > > + */ > > +static void > > +format_procedure_args_internal(Form_pg_proc procform, StringInfo buf, > > +bool force_qualify) > > +{ > > + int i; > > + char* (*func[2])(Oid) = {format_type_be, format_type_be_qualified}; > > + > > + appendStringInfoChar(buf, '('); > > + for (i = 0; i < procform->pronargs; i++) > > + { > > + Oid thisargtype = procform->proargtypes.values[i]; > > + char*argtype = NULL; > > + > > + if (i > 0) > > + appendStringInfoChar(buf, ','); > > + > > + argtype = func[force_qualify](thisargtype); > > + appendStringInfoString(buf, argtype); > > + pfree(argtype); > > + } > > + appendStringInfoChar(buf, ')'); > > +} > > > > 9a. > > Assign argtype = NULL looks redundant because it will always be > > overwritten anyhow. > > > > changed this. > > > ~ > > > > 9b. > > I understand why this function was put here beside the other static > > functions in "Support Routines" but IMO it really belongs nearby (i.e. > > directly above) the only caller (format_procedure_args). Keeping both > > those functional together will improve the readability of both, and > > will also remove the need to have the static forward declaration. > > There was no reply for 9b. Was it accidentally overlooked, or just chose not to do it? -- [1] https://www.postgresql.org/message-id/20230213090752.27ftbb6byiw3qcbl%40alvherre.pgsql Kind Regards, Peter Smith. Fujitsu Australia
Re: Support logical replication of DDLs
On Sat, Feb 11, 2023 at 3:21 AM vignesh C wrote: > > On Thu, 9 Feb 2023 at 03:47, Peter Smith wrote: > > > > Hi Vignesh, thanks for addressing my v63-0002 review comments. > > > > I confirmed most of the changes. Below is a quick follow-up for the > > remaining ones. > > > > On Mon, Feb 6, 2023 at 10:32 PM vignesh C wrote: > > > > > > On Mon, 6 Feb 2023 at 06:47, Peter Smith wrote: > > > > > > ... > > > > > > > > 8. > > > > + value = findJsonbValueFromContainer(container, JB_FOBJECT, ); > > > > > > > > Should the code be checking or asserting value is not NULL? > > > > > > > > (IIRC I asked this a long time ago - sorry if it was already answered) > > > > > > > > > > Yes, this was already answered by Zheng, quoting as "The null checking > > > for value is done in the upcoming call of expand_one_jsonb_element()." > > > in [1] > > > > Thanks for the info. I saw that Zheng-san only wrote it is handled in > > the “upcoming call of expand_one_jsonb_element”, but I don’t know if > > that is sufficient. For example, if the execution heads down the other > > path (expand_jsonb_array) with a NULL jsonarr then it going to crash, > > isn't it? So I still think some change may be needed here. > > Added an Assert for this. > Was this a correct change to make here? IIUC this Assert is now going to intercept both cases including the expand_one_jsonb_element() which previously would have thrown a proper ERROR. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
ier in this function. + */ + idxoid = GetRelationIdentityOrPK(localrel); + if (OidIsValid(idxoid)) + return idxoid; + + /* If index scans are disabled, use a sequential scan */ + if (!enable_indexscan) + return InvalidOid; ~ IMO that "Note" really belongs with the if (!enable)indexscan) more like this: SUGGESTION /* * Simple case, we already have a primary key or a replica identity index. */ idxoid = GetRelationIdentityOrPK(localrel); if (OidIsValid(idxoid)) return idxoid; /* * If index scans are disabled, use a sequential scan. * * Note we hesitate to move this check to earlier in this function * because allowing primary key or replica identity even when index scan * is disabled is the legacy behaviour. */ if (!enable_indexscan) return InvalidOid; == src/backend/replication/logical/worker.c 10. get_usable_indexoid +/* + * Decide whether we can pick an index for the relinfo (e.g., the relation) + * we're actually deleting/updating from. If it is a child partition of + * edata->targetRelInfo, find the index on the partition. + * + * Note that if the corresponding relmapentry has invalid usableIndexOid, + * the function returns InvalidOid. + */ "(e.g., the relation)" --> "(i.e. the relation)" -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
On Wed, Feb 15, 2023 at 11:05 AM Jim Jones wrote: > > On 14.02.23 23:45, Peter Smith wrote: > > Those results implied to me that this function code (in my environment > > anyway) is somehow introducing a side effect causing the *other* XML > > tests to fail. > > I believe I've found the issue. It is probably related to the XML OPTION > settings, as it seems to deliver different error messages when set to > DOCUMENT or CONTENT: > > postgres=# SET XML OPTION CONTENT; > SET > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > DETAIL: line 1: switching encoding : no input > > ^ > line 1: Document is empty > > ^ > postgres=# SET XML OPTION DOCUMENT; > SET > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > LINE 1: SELECT xmlformat(''); > ^ > DETAIL: line 1: switching encoding : no input > > ^ > line 1: Document is empty > > ^ > > v8 attached reintroduces the SELECT xmlformat('') test case and adds SET > XML OPTION DOCUMENT to the regression tests. > With v8, in my environment, in psql I see something slightly different: test_pub=# SET XML OPTION CONTENT; SET test_pub=# SELECT xmlformat(''); ERROR: invalid XML document DETAIL: line 1: switching encoding : no input line 1: Document is empty test_pub=# SET XML OPTION DOCUMENT; SET test_pub=# SELECT xmlformat(''); ERROR: invalid XML document LINE 1: SELECT xmlformat(''); ^ DETAIL: line 1: switching encoding : no input line 1: Document is empty ~~ test_pub=# SET XML OPTION CONTENT; SET test_pub=# INSERT INTO xmltest VALUES (3, '
Re: [PATCH] Add pretty-printed XML output option
On Wed, Feb 15, 2023 at 8:55 AM Jim Jones wrote: > > On 13.02.23 13:15, Jim Jones wrote: > > diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out > /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out > --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 > 09:02:57.077569000 + > +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out > 2023-02-12 09:05:45.14810 + > @@ -1695,10 +1695,7 @@ > -- XML format: empty string > SELECT xmlformat(''); > ERROR: invalid XML document > -DETAIL: line 1: switching encoding : no input > - > -^ > -line 1: Document is empty > +DETAIL: line 1: Document is empty > > ^ > -- XML format: invalid string (whitespaces) > > I couldn't figure out why the error messages are different -- I'm wondering > if the issue is the test environment itself. I just removed the troubling > test case for now > > SELECT xmlformat(''); > > v7 attached. > > Thanks for reviewing this patch! > Yesterday I looked at those cfbot configs and noticed all those machines have different versions of libxml. 2.10.3 2.6.23 2.9.10 2.9.13 But I don't if version numbers have any impact on the different error details or not. ~ The thing that puzzled me most is that in MY environment (CentOS7; libxml 20901; PG --with-libxml build) I get this behaviour. - Without your v6 patch 'make check' is all OK. - With your v6 patch other XML tests (not only yours) of 'make check' failed with different error messages. - Similarly, if I keep the v6 patch but just change (in xmlformat) the #ifdef USE_LIBXML to be #if 0, then only the new xmlformat tests fail, but the other XML tests are working OK again. Those results implied to me that this function code (in my environment anyway) is somehow introducing a side effect causing the *other* XML tests to fail. But so far I was unable to identify the reason. Sorry, I don't know this XML API well enough to help more. -- Kind Regards, Peter Smith. Fujitsu Austalia.
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Feb 14, 2023 at 5:04 PM Amit Kapila wrote: > > On Fri, Feb 10, 2023 at 8:56 AM Peter Smith wrote: > > > > On Fri, Feb 10, 2023 at 1:32 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Tuesday, February 7, 2023 11:17 AM Amit Kapila > > > wrote: > > > > > > > > On Mon, Feb 6, 2023 at 3:43 PM houzj.f...@fujitsu.com > > > > wrote: > > > > > > > > > > while reading the code, I noticed that in pa_send_data() we set wait > > > > > event to WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE while > > > > sending > > > > > the message to the queue. Because this state is used in multiple > > > > > places, user might not be able to distinguish what they are waiting > > > > > for. So It seems we'd better to use WAIT_EVENT_MQ_SEND here which will > > > > > be eaier to distinguish and understand. Here is a tiny patch for that. > > > > > > > > > > > As discussed[1], we'd better invent a new state for this purpose, so here > > > is the patch > > > that does the same. > > > > > > [1] > > > https://www.postgresql.org/message-id/CAA4eK1LTud4FLRbS0QqdZ-pjSxwfFLHC1Dx%3D6Q7nyROCvvPSfw%40mail.gmail.com > > > > > > > My first impression was the > > WAIT_EVENT_LOGICAL_PARALLEL_APPLY_SEND_DATA name seemed misleading > > because that makes it sound like the parallel apply worker is doing > > the sending, but IIUC it's really the opposite. > > > > So, how about WAIT_EVENT_LOGICAL_APPLY_SEND_DATA? > Yes, IIUC all the LR events are named WAIT_EVENT_LOGICAL_xxx. So names like the below seem correct format: a) WAIT_EVENT_LOGICAL_APPLY_SEND_DATA b) WAIT_EVENT_LOGICAL_LEADER_SEND_DATA c) WAIT_EVENT_LOGICAL_LEADER_APPLY_SEND_DATA Of those, I prefer option c) because saying LEADER_APPLY_xxx matches the name format of the existing WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Support logical replication of DDLs
FYI - the latest patch cannot be applied. See cfbot [1] -- [1] http://cfbot.cputube.org/patch_42_3595.log Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding "large" to PG_TEST_EXTRA
On Tue, Feb 14, 2023 at 10:44 AM Andres Freund wrote: > > Hi, > > On 2023-02-14 09:26:47 +1100, Peter Smith wrote: > > I've observed suggested test cases get rejected as being overkill, or > > because they would add precious seconds to the test execution. OTOH, I > > felt such tests would still help gain some additional percentages from > > the "code coverage" stats. The kind of tests I am thinking of don't > > necessarily need a huge disk/CPU - but they just take longer to run > > than anyone has wanted to burden the build-farm with. > > I'd say it depend on the test whether it's worth adding. Code coverage for its > own sake isn't that useful, they have to actually test something useful. And > tests have costs beyond runtime, e.g. new tests tend to fail in some edge > cases. > > E.g. just having tests hit more lines, without verifying that the behaviour is > actually correct, only provides limited additional assurance. It's also not > very useful to add a very expensive test that provides only a very small > additional amount of coverage. > > IOW, even if we add more test categories, it'll still be a tradeoff. > > > > Sorry for the thread interruption -- but I thought this might be the > > right place to ask: What is the recommended way to deal with such > > tests intended primarily for better code coverage? > > I don't think that exists today. > > Do you have an example of the kind of test you're thinking of? No, nothing specific in mind. But maybe like these: - tests for causing obscure errors that would never otherwise be reached without something deliberately designed to fail a certain way - tests for trivial user errors apparently deemed not worth bloating the regression tests with -- e.g. many errorConflictingDefElem not being called [1]. - timing-related or error tests where some long (multi-second) delay is a necessary part of the setup. -- [1] https://coverage.postgresql.org/src/backend/commands/subscriptioncmds.c.gcov.html Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding "large" to PG_TEST_EXTRA
On Tue, Feb 14, 2023 at 5:42 AM Andres Freund wrote: > > Hi, > > I'm working on rebasing [1], my patch to make relation extension scale > better. > > As part of that I'd like to add tests for relation extension. To be able to > test the bulk write strategy path, we need to have a few backends concurrently > load > 16MB files. > > It seems pretty clear that doing that on all buildfarm machines wouldn't be > nice / welcome. And it also seems likely that this won't be the last case > where that'd be useful. > > So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests > that we only want to execute on machines with sufficient resources. > Oh, I was been thinking about a similar topic recently, but I was unaware of PG_TEST_EXTRA [1] I've observed suggested test cases get rejected as being overkill, or because they would add precious seconds to the test execution. OTOH, I felt such tests would still help gain some additional percentages from the "code coverage" stats. The kind of tests I am thinking of don't necessarily need a huge disk/CPU - but they just take longer to run than anyone has wanted to burden the build-farm with. ~ Sorry for the thread interruption -- but I thought this might be the right place to ask: What is the recommended way to deal with such tests intended primarily for better code coverage? I didn't see anything that looked pre-built for 'coverage'. Did I miss something, or is it a case of just invent-your-own extra tests/values for your own ad-hoc requirements? e.g. make check EXTRA_TESTS=extra_regress_for_coverage make check-world PG_TEST_EXTRA='extra_tap_tests_for_coverage' Thanks! -- [1] https://www.postgresql.org/docs/devel/regress-run.html Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
t; 0) { TimestampTz next_feedback_due_ts; long next_feedback_due_ms; /* * Find if the next feedback is due earlier than the remaining delay ms. */ next_feedback_due_ts = TimestampTzPlusMilliseconds(send_time, status_interval_ms); next_feedback_due_ms = TimestampDifferenceMilliseconds(now, next_feedback_due_ts); if (next_feedback_due_ms < remaining_delay_ms) { /* delay before feedback */ WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, next_feedback_due_ms, WAIT_EVENT_LOGICAL_APPLY_DELAY); send_feedback(last_received, true, false, true); continue; } } /* delay before apply */ WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, remaining_delay_ms, WAIT_EVENT_LOGICAL_APPLY_DELAY); } == src/include/utils/wait_event.h 3. @@ -149,7 +149,8 @@ typedef enum WAIT_EVENT_REGISTER_SYNC_REQUEST, WAIT_EVENT_SPIN_DELAY, WAIT_EVENT_VACUUM_DELAY, - WAIT_EVENT_VACUUM_TRUNCATE + WAIT_EVENT_VACUUM_TRUNCATE, + WAIT_EVENT_LOGICAL_APPLY_DELAY } WaitEventTimeout; FYI - The PGDOCS has a section with "Table 28.13. Wait Events of Type Timeout" so if you a going to add a new Timeout Event then you also need to document it (alphabetically) in that table. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Support logical replication of DDLs
On Sat, Feb 11, 2023 at 3:31 AM vignesh C wrote: > > On Fri, 10 Feb 2023 at 21:50, vignesh C wrote: > > The attached v68 version patch has the changes for the same. > > I was not sure if we should support ddl replication of > create/alter/drop subscription commands as there might be some data > inconsistency issues in the following cases: > #node1 who is running in port 5432 > create publication pub_node1 for all tables with ( PUBLISH = 'insert, > update, delete, truncate'); > > #node2 who is running in port 5433 > create publication pub_node2 for all tables with(PUBLISH = 'insert, > update, delete, truncate', ddl = 'all'); > create subscription sub_node2 connection 'dbname=postgres host=node1 > port=5432' publication pub_node1; > > #node3 > create subscription sub_node3 connection 'dbname=postgres host=node2 > port=5433 publication pub_node2; > > #node1 > create table t1(c1 int ); > > #node2 > create table t1(c1 int); > alter subscription sub_node2 refresh publication; > > # Additionally this command will be replicated to node3, creating a > subscription sub2_node2 in node3 which will subscribe data from node1 > create subscription sub2_node2 connection 'dbname=postgres host=node1 > port=5432' publication pub_node1; > > After this any insert into t1 from node1 will be replicated to node2 > and node3, additionally node2's replicated data(which was replicated > from node1) will also be sent to node3 causing inconsistency. If the > table has unique or primary key constraints, it will lead to an error. > > Another option would be to replicate the create subscription in > disabled state and not support few ddl replication of alter > subscription which will connect to publisher like: > 1) Alter subscription sub1 enable; > 2) Alter subscription sub1 refresh publication; > > But in this case also, we will be able to support few alter > subscription commands and not support few alter subscription commands. > I feel it is better that we do not need to support ddl replication of > create/alter/drop subscription command and let users handle the > subscription commands. > Thoughts? > So essentially, node3 is subscribed 2x from the same table in node1 node1 --> node2 || ddl |V +-> node3 I think the suggested options are: option #1. If you support CREATE SUBSCRIPTION DDL replication then you can end up with the conflict troubles you described above option #2. If you support CREATE SUBSCRIPTION DDL replication but only make sure it is disabled first then your above scenario might be OK but you will also need to *not* allow DDL replication of the ALTER SUBSCRIPTION which might cause it to become re-enabled. IMO adding tricky rules is just inviting more problems. option #3. Do nothing, don't support it. +1 but see below for a variation of this ~ Actually, I am sort of expecting lots of potential difficulties with DDL replication and this CREATE SUBSCRIPTION problem is just one of them. IMO it would be a mistake to try and make the first version of these patches try to do *everything*. E.g. Why invent tricky solutions to problems without yet knowing user expectations/requirements? Therefore, my first impression is to do a generic option #4: option #4. This is a variation of "do nothing". My suggestion is you can still replicate every DDL via logical replication messages but just don't actually *apply* anything on the subscriber side for the commands which are problematic (like this one is). Instead of applying, the subscriber can just log a NOTICE about each command that was skipped. This will make it easier for the user to know what didn’t happen, but they can just cut/paste that command from the NOTICE if they really want to. Also, this option #4 is deliberately generic, which means you can do the same for every kind of DDL that proves too difficult to automate in the first version (I doubt CREATE SUBSCRIPTION will be the only example). The option #4 result might look like this: test_sub=# create subscription sub1 connection 'dbname=test_pub' publication pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION NOTICE: subscription "sub1" skipping DDL: create subscription sub_node2 connection 'dbname=postgres host=node1 port=5432' publication pub_node1; ... (And if it turns out users really do want this then it can be revisited in later patch versions) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Exit walsender before confirming remote flush in logical replication
Here are some comments for patch v7-0002. == Commit Message 1. This commit extends START_REPLICATION to accept SHUTDOWN_MODE clause. It is currently implemented only for logical replication. ~ "to accept SHUTDOWN_MODE clause." --> "to accept a SHUTDOWN_MODE clause." == doc/src/sgml/protocol.sgml 2. START_REPLICATION SLOT slot_name LOGICAL XXX/XXX [ SHUTDOWN_MODE { 'wait_flush' | 'immediate' } ] [ ( option_name [ option_value ] [, ...] ) ] ~ IMO this should say shutdown_mode as it did before: START_REPLICATION SLOT slot_name LOGICAL XXX/XXX [ SHUTDOWN_MODE shutdown_mode ] [ ( option_name [ option_value ] [, ...] ) ] ~~~ 3. + +shutdown_mode + + + Determines the behavior of the walsender process at shutdown. If + shutdown_mode is 'wait_flush', the walsender waits + for all the sent WALs to be flushed on the subscriber side. This is + the default when SHUTDOWN_MODE is not specified. If shutdown_mode is + 'immediate', the walsender exits without + confirming the remote flush. + + + Is the font of the "shutdown_mode" correct? I expected it to be like the others (e.g. slot_name) == src/backend/replication/walsender.c 4. +static void +CheckWalSndOptions(const StartReplicationCmd *cmd) +{ + if (cmd->shutdownmode) + { + char*mode = cmd->shutdownmode; + + if (pg_strcasecmp(mode, "wait_flush") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; + else if (pg_strcasecmp(mode, "immediate") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_IMMEDIATE; + else + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid value for shutdown mode: \"%s\"", mode), + errhint("Available values: wait_flush, immediate.")); + } + +} Unnecessary extra whitespace at end of the function. == src/include/nodes/replnodes. 5. @@ -83,6 +83,7 @@ typedef struct StartReplicationCmd char*slotname; TimeLineID timeline; XLogRecPtr startpoint; + char*shutdownmode; List*options; } StartReplicationCmd; IMO I those the last 2 members should have a comment something like: /* Only for logical replication */ because that will make it more clear why sometimes they are assigned and sometimes they are not. == src/include/replication/walreceiver.h 6. Should the protocol version be bumped (and documented) now that the START REPLICATION supports a new extended syntax? Or is that done only for messages sent by pgoutput? -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: [PATCH] Add pretty-printed XML output option
Something is misbehaving. Using the latest HEAD, and before applying the v6 patch, 'make check' is working OK. But after applying the v6 patch, some XML regression tests are failing because the DETAIL part of the message indicating the wrong syntax position is not getting displayed. Not only for your new tests -- but for other XML tests too. My ./configure looks like this: ./configure --prefix=/usr/local/pg_oss --with-libxml --enable-debug --enable-cassert --enable-tap-tests CFLAGS="-ggdb -O0 -g3 -fno-omit-frame-pointer" resulting in: checking whether to build with XML support... yes checking for libxml-2.0 >= 2.6.23... yes ~ e.g.(these are a sample of errors) xml ... FAILED 2561 ms @@ -344,8 +326,6 @@ ^ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced - -^ SELECT xmlparse(document ''); xmlparse - @@ -1696,14 +1676,8 @@ SELECT xmlformat(''); ERROR: invalid XML document DETAIL: line 1: switching encoding : no input - -^ line 1: Document is empty - -^ -- XML format: invalid string (whitespaces) SELECT xmlformat(' '); ERROR: invalid XML document DETAIL: line 1: Start tag expected, '<' not found - - ^ ~~ Separately (but maybe it's related?), the CF-bot test also reported a failure [1] with strange error detail differences. diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.14810 + @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty ^ -- XML format: invalid string (whitespaces) -- [1] https://api.cirrus-ci.com/v1/artifact/task/4802219812323328/testrun/build/testrun/regress/regress/regression.diffs Kind Regards, Peter Smith. Fujitsu Australia
Re: Exit walsender before confirming remote flush in logical replication
, then IMO maybe it's better to remove the "= 0" because the explicit assignment made me expect that it had special meaning, and then it was confusing when I could not find a reason. ~~~ 10. ProcessPendingWrites + /* + * In this function, there is a possibility that the walsender is + * stuck. It is caused when the opposite worker is stuck and then the + * send-buffer of the walsender becomes full. Therefore, we must add + * an additional path for shutdown for immediate shutdown mode. + */ + if (shutdown_mode == WALSND_SHUTDOWN_MODE_IMMIDEATE && + got_STOPPING) + WalSndDone(XLogSendLogical); 10a. Can this comment say something like "receiving worker" instead of "opposite worker"? SUGGESTION This can happen when the receiving worker is stuck, and then the send-buffer of the walsender... ~ 10b. IMO it makes more sense to check this around the other way. E.g. we don't care what is the shutdown_mode value unless got_STOPPING is true. SUGGESTION if (got_STOPPING && (shutdown_mode == WALSND_SHUTDOWN_MODE_IMMEDIATE)) ~~~ 11. WalSndDone + * If we are in the immediate shutdown mode, flush location and output + * buffer is not checked. This may break the consistency between nodes, + * but it may be useful for the system that has high-latency network to + * reduce the amount of time for shutdown. Add some quotes for the mode. SUGGESTION 'immediate' shutdown mode ~~~ 12. +/* + * Check options for walsender itself and set flags accordingly. + * + * Currently only one option is accepted. + */ +static void +CheckWalSndOptions(const StartReplicationCmd *cmd) +{ + if (cmd->shutdownmode) + ParseShutdownMode(cmd->shutdownmode); +} + +/* + * Parse given shutdown mode. + * + * Currently two values are accepted - "wait_flush" and "immediate" + */ +static void +ParseShutdownMode(char *shutdownmode) +{ + if (pg_strcasecmp(shutdownmode, "wait_flush") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; + else if (pg_strcasecmp(shutdownmode, "immediate") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_IMMIDEATE; + else + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid value for shutdown mode: \"%s\"", shutdownmode), + errhint("Available values: wait_flush, immediate.")); +} IMO the ParseShutdownMode function seems unnecessary because it's not really "parsing" anything and it is only called in one place. I suggest wrapping everything into the CheckWalSndOptions function. The end result is still only a simple function: SUGGESTION static void CheckWalSndOptions(const StartReplicationCmd *cmd) { if (cmd->shutdownmode) { char *mode = cmd->shutdownmode; if (pg_strcasecmp(mode, "wait_flush") == 0) shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; else if (pg_strcasecmp(mode, "immediate") == 0) shutdown_mode = WALSND_SHUTDOWN_MODE_IMMEDIATE; else ereport(ERROR, errcode(ERRCODE_SYNTAX_ERROR), errmsg("invalid value for shutdown mode: \"%s\"", mode), errhint("Available values: wait_flush, immediate.")); } } == src/include/replication/walreceiver.h 13. @@ -170,6 +170,7 @@ typedef struct * false if physical stream. */ char*slotname; /* Name of the replication slot or NULL. */ XLogRecPtr startpoint; /* LSN of starting point. */ + char*shutdown_mode; /* Name of specified shutdown name */ union { ~ 13a. Typo (shutdown name?) SUGGESTION /* The specified shutdown mode string, or NULL. */ ~ 13b. Because they have the same member names I kept confusing this option shutdown_mode with the other enum also called shutdown_mode. I wonder if is it possible to call this one something like 'shutdown_mode_str' to make reading the code easier? ~ 13c. Is this member in the right place? AFAIK this is not even implemented for physical replication. e.g. Why isn't this new member part of the 'logical' sub-structure in the union? == src/test/subscription/t/001_rep_changes.pl 14. -# Set min_apply_delay parameter to 3 seconds +# Check restart on changing min_apply_delay to 3 seconds my $delay = 3; $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay = '${delay}s')"); +$node_publisher->poll_query_until('postgres', + "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub_renamed' AND state = 'streaming';" + ) + or die + "Timed out while waiting for the walsender to restart after changing min_apply_delay to non-zero value"; IIUC this test is for verifying that a new walsender worker was started if the delay was changed from 0 to non-zero. E.g. I think it is for it is testing your new logic of the maybe_reread_subscription. Probably more complete testing also needs to check the other scenarios: * min_apply_delay from one non-zero value to another non-zero value --> verify a new worker is NOT started. * change min_apply_delay from non-zero to zero --> verify a new worker IS started -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
On Fri, Feb 10, 2023 at 1:32 PM houzj.f...@fujitsu.com wrote: > > On Tuesday, February 7, 2023 11:17 AM Amit Kapila > wrote: > > > > On Mon, Feb 6, 2023 at 3:43 PM houzj.f...@fujitsu.com > > wrote: > > > > > > while reading the code, I noticed that in pa_send_data() we set wait > > > event to WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE while > > sending > > > the message to the queue. Because this state is used in multiple > > > places, user might not be able to distinguish what they are waiting > > > for. So It seems we'd better to use WAIT_EVENT_MQ_SEND here which will > > > be eaier to distinguish and understand. Here is a tiny patch for that. > > > > > As discussed[1], we'd better invent a new state for this purpose, so here is > the patch > that does the same. > > [1] > https://www.postgresql.org/message-id/CAA4eK1LTud4FLRbS0QqdZ-pjSxwfFLHC1Dx%3D6Q7nyROCvvPSfw%40mail.gmail.com > My first impression was the WAIT_EVENT_LOGICAL_PARALLEL_APPLY_SEND_DATA name seemed misleading because that makes it sound like the parallel apply worker is doing the sending, but IIUC it's really the opposite. And since WAIT_EVENT_LOGICAL_PARALLEL_APPLY_LEADER_SEND_DATA seems too verbose, how about shortening the prefix for both events? E.g. BEFORE WAIT_EVENT_LOGICAL_PARALLEL_APPLY_SEND_DATA, WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE, AFTER WAIT_EVENT_LOGICAL_PA_LEADER_SEND_DATA, WAIT_EVENT_LOGICAL_PA_STATE_CHANGE, -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones wrote: > > On 09.02.23 08:23, Tom Lane wrote: > > Um ... why are you using PG_TRY here at all? It seems like > > you have full control of the actually likely error cases. > > The only plausible error out of the StringInfo calls is OOM, > > and you probably don't want to trap that at all. > > My intention was to catch any unexpected error from > xmlDocDumpFormatMemory and handle it properly. But I guess you're right, > I can control the likely error cases by checking doc and nbytes. > > You suggest something along these lines? > > xmlDocPtr doc; > xmlChar*xmlbuf = NULL; > text *arg = PG_GETARG_TEXT_PP(0); > StringInfoData buf; > int nbytes; > > doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, > GetDatabaseEncoding(), NULL); > > if(!doc) > elog(ERROR, "could not parse the given XML document"); > > xmlDocDumpFormatMemory(doc, , , 1); > > xmlFreeDoc(doc); > > if(!nbytes) > elog(ERROR, "could not indent the given XML document"); > > initStringInfo(); > appendStringInfoString(, (const char *)xmlbuf); > > xmlFree(xmlbuf); > > PG_RETURN_XML_P(stringinfo_to_xmltype()); > > > Thanks! > Something like that LGTM, but here are some other minor comments. == 1. FYI, there are some whitespace warnings applying the v5 patch [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41: trailing whitespace. warning: squelched 48 whitespace errors warning: 53 lines add whitespace errors. == src/test/regress/sql/xml.sql 2. The v5 patch was already testing NULL, but it might be good to add more tests to verify the function is behaving how you want for edge cases. For example, +-- XML pretty print: NULL, empty string, spaces only... SELECT xmlpretty(NULL); SELECT xmlpretty(''); SELECT xmlpretty(' '); ~~ 3. The function is returning XML anyway, so is the '::xml' casting in these tests necessary? e.g. SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL); == src/include/catalog/pg_proc.dat 4. + { oid => '4642', descr => 'Indented text from xml', + proname => 'xmlpretty', prorettype => 'xml', + proargtypes => 'xml', prosrc => 'xmlpretty' }, Spurious leading space for this new entry. == doc/src/sgml/func.sgml 5. + + + 42 + + + +(1 row) + +]]> A spurious blank line in the example after the "(1 row)" ~~~ 6. Does this function docs belong in section 9.15.1 "Producing XML Content"? Or (since it is not really producing content) should it be moved to the 9.15.3 "Processing XML" section? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
> The comment adjustment suggested by Peter-san above > was also included in this v33. > Please have a look at the attached patch. Patch v33 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PATCH] Add pretty-printed XML output option
On Thu, Feb 9, 2023 at 10:42 AM Jim Jones wrote: > > On 09.02.23 00:09, Peter Smith wrote: > > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the > > other xmlFreeDoc(doc) is not. As the doc is assigned outside the > > PG_TRY shouldn't those both be the same? > > Hi Peter, > > My logic there was the following: if program reached that part of the > code it means that the xml_parse() and xmlDocDumpFormatMemory() worked, > which consequently means that the variables doc and xmlbuf are != NULL, > therefore not needing to be checked. Am I missing something? > Thanks. I think I understand it better now -- I expect xmlDocDumpFormatMemory will cope OK when passed a NULL doc (see this source [1]), but it will return nbytes of 0, but your code will still throw ERROR, meaning the guard for doc NULL is necessary for the PG_CATCH. In that case, everything LGTM. ~ OTOH, if you are having to check for NULL doc anyway, maybe it's just as easy only doing that up-front. Then you could quick-exit the function without calling xmlDocDumpFormatMemory etc. in the first place. For example: doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL); if (!doc) return 0; -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: [PATCH] Add pretty-printed XML output option
On Thu, Feb 9, 2023 at 7:31 AM Jim Jones wrote: > > while working on another item of the TODO list I realized that I should > be using a PG_TRY() block in he xmlDocDumpFormatMemory call. > > Fixed in v5. > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the other xmlFreeDoc(doc) is not. As the doc is assigned outside the PG_TRY shouldn't those both be the same? ------ Kind Regards, Peter Smith. Fujitsu Australia.
Re: Support logical replication of DDLs
Hi Vignesh, thanks for addressing my v63-0002 review comments. I confirmed most of the changes. Below is a quick follow-up for the remaining ones. On Mon, Feb 6, 2023 at 10:32 PM vignesh C wrote: > > On Mon, 6 Feb 2023 at 06:47, Peter Smith wrote: > > ... > > > > 8. > > + value = findJsonbValueFromContainer(container, JB_FOBJECT, ); > > > > Should the code be checking or asserting value is not NULL? > > > > (IIRC I asked this a long time ago - sorry if it was already answered) > > > > Yes, this was already answered by Zheng, quoting as "The null checking > for value is done in the upcoming call of expand_one_jsonb_element()." > in [1] Thanks for the info. I saw that Zheng-san only wrote it is handled in the “upcoming call of expand_one_jsonb_element”, but I don’t know if that is sufficient. For example, if the execution heads down the other path (expand_jsonb_array) with a NULL jsonarr then it going to crash, isn't it? So I still think some change may be needed here. > > 11. > > +/* > > + * Expand a JSON value as an operator name. > > + */ > > +static void > > +expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval) > > > > Should this function comment be more like the comment for > > expand_jsonval_dottedname by saying there can be an optional > > "schemaname"? > > Modified Is it really OK for the “objname" to be optional here (Yes, I know the code is currently implemented like it is OK, but I am doubtful) That would everything can be optional and the buf result might be nothing. It could also mean if the "schemaname" is provided but the "objname" is not, then the buf will have a trailing ".". It doesn't sound quite right to me. > > > ~~~ > > > > 12. > > +static bool > > +expand_jsonval_string(StringInfo buf, JsonbValue *jsonval) > > +{ > > + if (jsonval->type == jbvString) > > + { > > + appendBinaryStringInfo(buf, jsonval->val.string.val, > > +jsonval->val.string.len); > > + } > > + else if (jsonval->type == jbvBinary) > > + { > > + json_trivalue present; > > + > > + present = find_bool_in_jsonbcontainer(jsonval->val.binary.data, > > + "present"); > > + > > + /* > > + * If "present" is set to false, this element expands to empty; > > + * otherwise (either true or absent), fall through to expand "fmt". > > + */ > > + if (present == tv_false) > > + return false; > > + > > + expand_fmt_recursive(jsonval->val.binary.data, buf); > > + } > > + else > > + return false; > > + > > + return true; > > +} > > > > I felt this could be simpler if there is a new 'expanded' variable > > because then you can have just a single return point instead of 3 > > returns; > > > > If you choose to do this there is a minor tweak to the "fall through" > > comment. > > > > SUGGESTION > > expand_jsonval_string(StringInfo buf, JsonbValue *jsonval) > > { > > bool expanded = true; > > > > if (jsonval->type == jbvString) > > { > > appendBinaryStringInfo(buf, jsonval->val.string.val, > >jsonval->val.string.len); > > } > > else if (jsonval->type == jbvBinary) > > { > > json_trivalue present; > > > > present = find_bool_in_jsonbcontainer(jsonval->val.binary.data, > > "present"); > > > > /* > > * If "present" is set to false, this element expands to empty; > > * otherwise (either true or absent), expand "fmt". > > */ > > if (present == tv_false) > > expanded = false; > > else > > expand_fmt_recursive(jsonval->val.binary.data, buf); > > } > > > > return expanded; > > } > > I'm not sure if this change is required as this will introduce a new > variable and require it to be set, this variable should be set to > "expand = false" in else after else if also, instead I preferred the > existing code. I did not make any change for this unless you are > seeing some bigger optimization. > Sorry, I messed up the previous code suggestion. It should have said: SUGGESTION expand_jsonval_string(StringInfo buf, JsonbValue *jsonval) { bool expanded = false; if (jsonval->type == jbvString) { appendBinaryStringInfo(buf, jsonval->val.string.val, jsonval->val.strin
Re: Deadlock between logrep apply worker and tablesync worker
On Fri, Feb 3, 2023 at 6:58 PM houzj.f...@fujitsu.com wrote: > > On Thursday, February 2, 2023 7:21 PM Amit Kapila > wrote: > > > > On Thu, Feb 2, 2023 at 12:05 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Tuesday, January 31, 2023 1:07 AM vignesh C > > wrote: > > > > On Mon, 30 Jan 2023 at 17:30, vignesh C wrote: > > > > > > > > > > I also tried to test the time of "src/test/subscription/t/002_types.pl" > > > before and after the patch(change the lock level) and Tom's > > > patch(split > > > transaction) like what Vignesh has shared on -hackers. > > > > > > I run about 100 times for each case. Tom's and the lock level patch > > > behave similarly on my machines[1]. > > > > > > HEAD: 3426 ~ 6425 ms > > > HEAD + Tom: 3404 ~ 3462 ms > > > HEAD + Vignesh: 3419 ~ 3474 ms > > > HEAD + Tom + Vignesh: 3408 ~ 3454 ms > > > > > > Even apart from the testing time reduction, reducing the lock level > > > and lock the specific object can also help improve the lock contention > > > which user(that use the exposed function) , table sync worker and > > > apply worker can also benefit from it. So, I think pushing the patch to > > > change > > the lock level makes sense. > > > > > > And the patch looks good to me. > > > > > > > Thanks for the tests. I also see a reduction in test time variability with > > Vignesh's > > patch. I think we can release the locks in case the origin is concurrently > > dropped as in the attached patch. I am planning to commit this patch > > tomorrow unless there are more comments or objections. > > > > > While on it, after pushing the patch, I think there is another case > > > might also worth to be improved, that is the table sync and apply > > > worker try to drop the same origin which might cause some delay. This > > > is another case(different from the deadlock), so I feel we can try to > > > improve > > this in another patch. > > > > > > > Right, I think that case could be addressed by Tom's patch to some extent > > but > > I am thinking we should also try to analyze if we can completely avoid the > > need > > to remove origins from both processes. One idea could be to introduce > > another relstate something like PRE_SYNCDONE and set it in a separate > > transaction before we set the state as SYNCDONE and remove the slot and > > origin in tablesync worker. > > Now, if the tablesync worker errors out due to some reason during the second > > transaction, it can remove the slot and origin after restart by checking > > the state. > > However, it would add another relstate which may not be the best way to > > address this problem. Anyway, that can be accomplished as a separate patch. > > Here is an attempt to achieve the same. > Basically, the patch removes the code that drop the origin in apply worker. > And > add a new state PRE_SYNCDONE after synchronization finished in front of apply > (sublsn set), but before dropping the origin and other final cleanups. The > tablesync will restart and redo the cleanup if it failed after reaching the > new > state. Besides, since the changes can already be applied on the table in > PRE_SYNCDONE state, so I also modified the check in > should_apply_changes_for_rel(). And some other conditions for the origin drop > in subscription commands are were adjusted in this patch. > BTW, the tablesync.c has a large file header comment which describes all about the relstates including some examples. So this patch needs to include modifications to that comment. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Wed, Feb 8, 2023 at 8:03 PM Hayato Kuroda (Fujitsu) wrote: > ... > > == > > > > src/backend/replication/logical/worker.c > > > > 2. maybe_apply_delay > > > > + if (wal_receiver_status_interval > 0 && > > + diffms > wal_receiver_status_interval * 1000L) > > + { > > + WaitLatch(MyLatch, > > + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, > > + wal_receiver_status_interval * 1000L, > > + WAIT_EVENT_RECOVERY_APPLY_DELAY); > > + send_feedback(last_received, true, false, true); > > + } > > > > I felt that introducing another variable like: > > > > long statusinterval_ms = wal_receiver_status_interval * 1000L; > > > > would help here by doing 2 things: > > 1) The condition would be easier to read because the ms units would be the > > same > > 2) Won't need * 1000L repeated in two places. > > > > Only, do take care to assign this variable in the right place in this > > loop in case the configuration is changed. > > Fixed. Calculations are done on two lines - first one is the entrance of the > loop, > and second one is the after SIGHUP is detected. > TBH, I expected you would write this as just a *single* variable assignment before the condition like below: SUGGESTION (tweaked comment and put single assignment before condition) /* * Call send_feedback() to prevent the publisher from exiting by * timeout during the delay, when the status interval is greater than * zero. */ status_interval_ms = wal_receiver_status_interval * 1000L; if (status_interval_ms > 0 && diffms > status_interval_ms) { ... ~ I understand in theory, your code is more efficient, but in practice, I think the overhead of a single variable assignment every loop iteration (which is doing WaitLatch anyway) is of insignificant concern, whereas having one assignment is simpler than having two IMO. But, if you want to keep it the way you have then that is OK. Otherwise, this patch v32 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Deadlock between logrep apply worker and tablesync worker
On Tue, Feb 7, 2023 at 6:46 PM houzj.f...@fujitsu.com wrote: > > On Tuesday, February 7, 2023 12:12 PM Peter Smith > wrote: > > On Fri, Feb 3, 2023 at 6:58 PM houzj.f...@fujitsu.com > > > > wrote: > > > > > ... > > > > Right, I think that case could be addressed by Tom's patch to some > > > > extent but I am thinking we should also try to analyze if we can > > > > completely avoid the need to remove origins from both processes. One > > > > idea could be to introduce another relstate something like > > > > PRE_SYNCDONE and set it in a separate transaction before we set the > > > > state as SYNCDONE and remove the slot and origin in tablesync worker. > > > > Now, if the tablesync worker errors out due to some reason during > > > > the second transaction, it can remove the slot and origin after restart > > > > by > > checking the state. > > > > However, it would add another relstate which may not be the best way > > > > to address this problem. Anyway, that can be accomplished as a separate > > patch. > > > > > > Here is an attempt to achieve the same. > > > Basically, the patch removes the code that drop the origin in apply > > > worker. And add a new state PRE_SYNCDONE after synchronization > > > finished in front of apply (sublsn set), but before dropping the > > > origin and other final cleanups. The tablesync will restart and redo > > > the cleanup if it failed after reaching the new state. Besides, since > > > the changes can already be applied on the table in PRE_SYNCDONE state, > > > so I also modified the check in should_apply_changes_for_rel(). And > > > some other conditions for the origin drop in subscription commands are > > were adjusted in this patch. > > > > > > > Here are some review comments for the 0001 patch > > > > == > > General Comment > > > > 0. > > The idea of using the extra relstate for clean-up seems OK, but the > > implementation of the new state in this patch appears misordered and > > misnamed to me. > > > > The state name should indicate what it is doing (PRE_SYNCDONE is > > meaningless). The patch describes in several places that this state means > > "synchronized, but not yet cleaned up" therefore IMO it means the SYNCDONE > > state should be *before* this new state. And since this new state is for > > "cleanup" then let's call it something like that. > > > > To summarize, I don’t think the meaning of SYNCDONE should be touched. > > SYNCDONE means the synchronization is done, same as before. And your new > > "cleanup" state belongs directly *after* that. IMO it should be like this: > > > > 1. STATE_INIT > > 2. STATE_DATASYNC > > 3. STATE_FINISHEDCOPY > > 4. STATE_SYNCDONE > > 5. STATE_CLEANUP <-- new relstate > > 6. STATE_READY > > > > Of course, this is going to impact almost every aspect of the patch, but I > > think > > everything will be basically the same as you have it now > > -- only all the state names and comments need to be adjusted according to > > the > > above. > > Although I agree the CLEANUP is easier to understand, but I am a bit concerned > that the changes would be a bit invasive. > > If we add a CLEANUP state at the end as suggested, it will change the meaning > of the existing SYNCDONE state, before the change it means both data sync and > cleanup have been done, but after the change it only mean the data sync is > over. This also means all the current C codes that considered the SYNCDONE as > the final state of table sync will need to be changed. Moreover, it's common > for user to query the relation state from pg_subscription_rel to identify if > the table sync of a table is finished(e.g. check relstate IN ('r', 's')), but > if we add a new state(CLEANUP) as the final state, then all these SQLs would > need to be changed as they need to check like relstate IN ('r', 'x'(new > cleanup > state)). > IIUC, you are saying that we still want to keep the SYNCDONE state as the last state before READY mainly because otherwise there is too much impact on user/test SQL that is currently checking those ('s','r') states. OTOH, in the current 001 patch you had the SUBREL_STATE_PRE_SYNCDONE meaning "synchronized but not yet cleaned up" (that's verbatim from your PGDOCS). And there is C code where you are checking SUBREL_STATE_PRE_SYNCDONE and essentially giving the state before the SYNCDONE an equal status to the SYNCDONE (e.g. should_apply_changes_for_rel seemed to be doing th
Re: Time delayed LR (WAS Re: logical replication restrictions)
Here are my review comments for v31-0001 == doc/src/sgml/glossary.sgml 1. + + Replication setup that applies time-delayed copy of the data. + That sentence seemed a bit strange to me. SUGGESTION Replication setup that delays the application of changes by a specified minimum time-delay period. == src/backend/replication/logical/worker.c 2. maybe_apply_delay + if (wal_receiver_status_interval > 0 && + diffms > wal_receiver_status_interval * 1000L) + { + WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + wal_receiver_status_interval * 1000L, + WAIT_EVENT_RECOVERY_APPLY_DELAY); + send_feedback(last_received, true, false, true); + } I felt that introducing another variable like: long statusinterval_ms = wal_receiver_status_interval * 1000L; would help here by doing 2 things: 1) The condition would be easier to read because the ms units would be the same 2) Won't need * 1000L repeated in two places. Only, do take care to assign this variable in the right place in this loop in case the configuration is changed. == src/test/subscription/t/001_rep_changes.pl 3. +# Test time-delayed logical replication +# +# If the subscription sets min_apply_delay parameter, the logical replication +# worker will delay the transaction apply for min_apply_delay milliseconds. We +# look the time duration between tuples are inserted on publisher and then +# changes are replicated on subscriber. This comment and the other one appearing later in this test are both explaining the same test strategy. I think both comments should be combined into one big one up-front, like this: SUGGESTION If the subscription sets min_apply_delay parameter, the logical replication worker will delay the transaction apply for min_apply_delay milliseconds. We verify this by looking at the time difference between a) when tuples are inserted on the publisher, and b) when those changes are replicated on the subscriber. Even on slow machines, this strategy will give predictable behavior. ~~ 4. +my $delay = 3; + +# Set min_apply_delay parameter to 3 seconds +$node_subscriber->safe_psql('postgres', + "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay = '${delay}s')"); IMO that "my $delay = 3;" assignment should be *after* the comment: e.g. + +# Set min_apply_delay parameter to 3 seconds +my $delay = 3; +$node_subscriber->safe_psql('postgres', + "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay = '${delay}s')"); ~~~ 5. +# Make new content on publisher and check its presence in subscriber depending +# on the delay applied above. Before doing the insertion, get the +# current timestamp that will be used as a comparison base. Even on slow +# machines, this allows to have a predictable behavior when comparing the +# delay between data insertion moment on publisher and replay time on subscriber. Most of this comment is now redundant because this was already explained in the big comment up-front (see #3). Only one useful sentence is left. SUGGESTION Before doing the insertion, get the current timestamp that will be used as a comparison base. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Tue, Feb 7, 2023 at 4:02 PM Amit Kapila wrote: > > On Tue, Feb 7, 2023 at 10:07 AM Kyotaro Horiguchi > wrote: > > > > At Tue, 7 Feb 2023 09:10:01 +0530, Amit Kapila > > wrote in > > > On Tue, Feb 7, 2023 at 6:03 AM Peter Smith wrote: > > > > 5b. > > > > Since there are no translator considerations here why not write the > > > > second error like: > > > > > > > > errmsg("%d ms is outside the valid range for parameter > > > > \"min_apply_delay\" (%d .. %d)", > > > > result, 0, PG_INT32_MAX)) > > > > > > > > > > I see that existing usage in the code matches what the patch had > > > before this comment. See below and similar usages in the code. > > > if (start <= 0) > > > ereport(ERROR, > > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > errmsg("invalid value for parameter \"%s\": %d", > > > "start", start))); > > > > The same errmsg text occurs mamy times in the tree. On the other hand > > the pointed message is the only one. I suppose Peter considered this > > aspect. > > > > # "%d%s%s is outside the valid range for parameter \"%s\" (%d .. %d)" > > # also appears just once > > > > As for me, it seems to me a good practice to do that regadless of the > > number of duplicates to (semi)mechanically avoid duplicates. > > > > (But I believe I would do as Peter suggests by myself for the first > > cut, though:p) > > > > Personally, I would prefer consistency. I think we can later start a > new thread to change the existing message and if there is a consensus > and value in the same then we could use the same style here as well. > Of course, if there is a convention then we should stick to it. My understanding was that (string literal) message parameters are specified separately from the message format string primarily as an aid to translators. That makes good sense for parameters with names that are also English words (like "start" etc), but for non-word parameters like "min_apply_delay" there is no such ambiguity in the first place. Anyway, I am fine with it being written either way. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Deadlock between logrep apply worker and tablesync worker
= synchronization done, c = clean-up done, r = ready (normal replication) == src/backend/commands/subscriptioncmds.c 4. AlterSubscription_refresh Some adjustments are needed according to my "General Comment". ~~~ 5. DropSubscription Some adjustments are needed according to my "General Comment". == src/backend/replication/logical/tablesync.c 6. + * Update the state of the table to SUBREL_STATE_SYNCDONE and cleanup the + * tablesync slot and drop the tablesync's origin tracking. + */ +static void +finish_synchronization(bool restart_after_crash) 6a. Suggest calling this function something like 'cleanup_after_synchronization' ~ 6b. Some adjustments to states and comments are needed according to my "General Comment". ~~~ 7. process_syncing_tables_for_sync - MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE; + MyLogicalRepWorker->relstate = SUBREL_STATE_PRE_SYNCDONE; MyLogicalRepWorker->relstate_lsn = current_lsn; This should just be setting SUBREL_STATE_SYNCDONE how it previously did. Other states/comments in this function to change according to my "General Comments". ~ 8. if (rstate->state == SUBREL_STATE_SYNCDONE) { /* * Apply has caught up to the position where the table sync has * finished. Mark the table as ready so that the apply will just * continue to replicate it normally. */ That should now be checking for SUBREL_STATE_CLEANUPDONE according to me "General Comment" ~~~ 9. process_syncing_tables_for_apply Some adjustments to states and comments are needed according to my "General Comment". ~~~ 10. LogicalRepSyncTableStart Some adjustments to states and comments are needed according to my "General Comment". == src/backend/replication/logical/worker.c 11. should_apply_changes_for_rel Some adjustments to states according to my "General Comment". == src/include/catalog/pg_subscription_rel.h 12. @@ -62,8 +62,10 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, Subsc * NULL) */ #define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed * (sublsn NULL) */ -#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of - * apply (sublsn set) */ +#define SUBREL_STATE_PRE_SYNCDONE 'p' /* synchronization finished in front of + * apply (sublsn set), but the final + * cleanup has not yet been performed */ +#define SUBREL_STATE_SYNCDONE 's' /* synchronization complete */ #define SUBREL_STATE_READY 'r' /* ready (sublsn set) */ Some adjustments to states and comments are needed according to my "General Comment". -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
Here are my review comments for v29-0001. == Commit Message 1. Discussion: https://postgr.es/m/CAB-JLwYOYwL=xtyaxkih5ctm_vm8kjkh7aaitckvmch4rzr...@mail.gmail.com tmp ~ What's that "tmp" doing there? A typo? == doc/src/sgml/catalogs.sgml 2. + + + subminapplydelay int4 + + + The minimum delay (ms) for applying changes. + + For consistency remove the period (.) because the other single-sentence descriptions on this page do not have one. == src/backend/commands/subscriptioncmds.c 3. AlterSubscription + errmsg("cannot set parallel streaming mode for subscription with %s", +"min_apply_delay")); Since there are no translator considerations here why not write it like this: errmsg("cannot set parallel streaming mode for subscription with min_apply_delay") ~~~ 4. AlterSubscription + errmsg("cannot set %s for subscription in parallel streaming mode", +"min_apply_delay")); Since there are no translator considerations here why not write it like this: errmsg("cannot set min_apply_delay for subscription in parallel streaming mode") ~~~ 5. +defGetMinApplyDelay(DefElem *def) +{ + char*input_string; + int result; + const char *hintmsg; + + input_string = defGetString(def); + + /* + * Parse given string as parameter which has millisecond unit + */ + if (!parse_int(input_string, , GUC_UNIT_MS, )) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": \"%s\"", + "min_apply_delay", input_string), + hintmsg ? errhint("%s", _(hintmsg)) : 0)); + + /* + * Check both the lower boundary for the valid min_apply_delay range and + * the upper boundary as the safeguard for some platforms where INT_MAX is + * wider than int32 respectively. Although parse_int() has confirmed that + * the result is less than or equal to INT_MAX, the value will be stored + * in a catalog column of int32. + */ + if (result < 0 || result > PG_INT32_MAX) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("%d ms is outside the valid range for parameter \"%s\" (%d .. %d)", + result, + "min_apply_delay", + 0, PG_INT32_MAX))); + + return result; +} 5a. Since there are no translator considerations here why not write the first error like: errmsg("invalid value for parameter \"min_apply_delay\": \"%s\"", input_string) ~ 5b. Since there are no translator considerations here why not write the second error like: errmsg("%d ms is outside the valid range for parameter \"min_apply_delay\" (%d .. %d)", result, 0, PG_INT32_MAX)) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Sat, Feb 4, 2023 at 5:04 PM Takamichi Osumi (Fujitsu) wrote: > ... > > Kindly have a look at the attached v27. > Here are some review comments for patch v27-0001. == src/test/subscription/t/032_apply_delay.pl 1. +# Confirm the time-delayed replication has been effective from the server log +# message where the apply worker emits for applying delay. Moreover, verify +# that the current worker's remaining wait time is sufficiently bigger than the +# expected value, in order to check any update of the min_apply_delay. +sub check_apply_delay_log ~ "has been effective from the server log" --> "worked, by inspecting the server log" ~~~ 2. +my $delay = 3; Might be better to name this variable as 'min_apply_delay'. ~~~ 3. +# Now wait for replay to complete on publisher. We're done waiting when the +# subscriber has applyed up to the publisher LSN. +$node_publisher->wait_for_catchup($appname); 3a. Something seemed wrong with the comment. Was it meant to say more like? "The publisher waits for the replication to complete". Typo: "applyed" ~ 3b. Instead of doing this wait_for_catchup stuff why don't you just use a synchronous pub/sub and then the publication will just block internally like you require but without you having to block using test code? ~~~ 4. +# Run a query to make sure that the reload has taken effect. +$node_publisher->safe_psql('postgres', q{SELECT 1}); SUGGESTION (for the comment) # Running a dummy query causes the config to be reloaded. ~~~ 5. +# Confirm the record is not applied expectedly +my $result = $node_subscriber->safe_psql('postgres', + "SELECT count(a) FROM tab_int WHERE a = 0;"); +is($result, qq(0), "check the delayed transaction was not applied"); "expectedly" ?? SUGGESTION (for comment) # Confirm the record was not applied -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Support logical replication of DDLs
s also be checking/asserting that the type is jbvNumeric? ~~~ 14. +/* + * Expand a JSON value as a role name. If the is_public element is set to + * true, PUBLIC is expanded (no quotes); otherwise, expand the given role name, + * quoting as an identifier. + */ +static void +expand_jsonval_role(StringInfo buf, JsonbValue *jsonval) Maybe more readable to quote that param? BEFORE If the is_public element is set... AFTER If the 'is_public' element is set... ~~~ 15. + * + * Returns false if no actual expansion was made (due to the "present" flag + * being set to "false" in formatted string expansion). + */ +static bool +expand_one_jsonb_element(StringInfo buf, char *param, JsonbValue *jsonval, + convSpecifier specifier, const char *fmt) +{ + bool result = true; + ErrorContextCallback sqlerrcontext; ~ 15a. Looking at the implementation, maybe that comment can be made more clear. Something like below: SUGGESTION Returns true, except for the formatted string case if no actual expansion was made (due to the "present" flag being set to "false"). ~ 15b. Maybe use a better variable name. "result" --> "string_expanded" == src/include/catalog/pg_proc.dat 16. @@ -11891,4 +11891,10 @@ prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary', prosrc => 'brin_minmax_multi_summary_send' }, +{ oid => '4642', descr => 'deparse the DDL command into JSON format string', + proname => 'ddl_deparse_to_json', prorettype => 'text', + proargtypes => 'pg_ddl_command', prosrc => 'ddl_deparse_to_json' }, +{ oid => '4643', descr => 'expand JSON format DDL to a plain DDL command', + proname => 'ddl_deparse_expand_command', prorettype => 'text', + pr 16a. "deparse the DDL command into JSON format string" ==> "deparse the DDL command into a JSON format string" ~ 16b. "expand JSON format DDL to a plain DDL command" --> "expand JSON format DDL to a plain text DDL command" == src/include/tcop/ddl_deparse.h 17. + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group "2022" --> "2023" ~~~ +extern char *deparse_utility_command(CollectedCommand *cmd, bool verbose_mode); +extern char *deparse_ddl_json_to_string(char *jsonb); +extern char *deparse_drop_command(const char *objidentity, const char *objecttype, + DropBehavior behavior); + + +#endif /* DDL_DEPARSE_H */ Double blank lines. == src/include/tcop/deparse_utility.h 18. @@ -100,6 +103,12 @@ typedef struct CollectedCommand { ObjectType objtype; } defprivs; + + struct + { + ObjectAddress address; + Node *real_create; + } ctas; } d; All the other sub-structures have comments. IMO this one should have a comment too. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Support logical replication of DDLs
On Fri, Feb 3, 2023 at 9:21 PM Alvaro Herrera wrote: > > On 2023-Feb-03, Peter Smith wrote: > ... > > 3. ExecuteGrantStmt > > > > + /* Copy the grantor id needed for DDL deparsing of Grant */ > > + istmt.grantor_uid = grantor; > > + > > > > SUGGESTION (comment) > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant > > Is istmt really "the parse tree" actually? As I recall, it's a derived > struct that's created during execution of the grant/revoke command, so > modifying the comment like this would be a mistake. > I thought this comment was analogous to another one from this same patch 0001 (see seclabel.c), so the suggested change above was simply to make the wording consistent. @@ -134,6 +134,9 @@ ExecSecLabelStmt(SecLabelStmt *stmt) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("must specify provider when multiple security label providers have been loaded"))); provider = (LabelProvider *) linitial(label_provider_list); + + /* Copy the provider name to the parsetree, needed for DDL deparsing of SecLabelStmt */ + stmt->provider = pstrdup(provider->provider_name); So if the suggestion for the ExecuteGrantStmt comment was a mistake then perhaps the ExecSecLabelStmt comment is wrong also? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Fri, Feb 3, 2023 at 4:21 PM Amit Kapila wrote: > > On Fri, Feb 3, 2023 at 6:41 AM Peter Smith wrote: > > > > On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu) > > wrote: > > > > > ... > > > > > > > > > > Besides, I am not sure it's a stable test to check the log. Is it > > > > possible that there's > > > > no such log on a slow machine? I modified the code to sleep 1s at the > > > > beginning > > > > of apply_dispatch(), then the new added test failed because the server > > > > log > > > > cannot match. > > > To get the log by itself is necessary to ensure > > > that the delay is conducted by the apply worker, because we emit the > > > diffms > > > only if it's bigger than 0 in maybe_apply_delay(). If we omit the step, > > > we are not sure the delay is caused by other reasons or the time-delayed > > > feature. > > > > > > As you mentioned, it's possible that no log is emitted on slow machine. > > > Then, > > > the idea to make the test safer for such machines should be to make the > > > delayed time longer. > > > But we shortened the delay time to 1 second to mitigate the long test > > > execution time of this TAP test. > > > So, I'm not sure if it's a good idea to make it longer again. > > > > I think there are a couple of things that can be done about this problem: > > > > 1. If you need the code/test to remain as-is then at least the test > > message could include some comforting text like "(this can fail on > > slow machines when the delay time is already exceeded)" so then a test > > failure will not cause undue alarm. > > > > 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so that > > it will *always* log the remaining wait time even if that wait time > > becomes negative. Then I think the test cases can be made > > deterministic instead of relying on good luck. This seems like the > > better option. > > > > I don't understand why we have to do any of this instead of using 3s > as min_apply_delay similar to what we are doing in > src/test/recovery/t/005_replay_delay. Also, I think we should use > exactly the same way to verify the test even though we want to keep > the log level as DEBUG2 to check logs in case of any failures. > IIUC the reasons are due to conflicting requirements. e.g. - A longer delay like 3s might work better for testing this feature, but OTOH - A longer delay will also cause the whole BF execution to take longer -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Time delayed LR (WAS Re: logical replication restrictions)
Here are my review comments for patch v26-0001. On Thu, Feb 2, 2023 at 7:18 PM Takamichi Osumi (Fujitsu) wrote: > > Hi, > > On Wednesday, February 1, 2023 1:37 PM Peter Smith > wrote: > > Here are my review comments for the patch v25-0001. > Thank you for your review ! > > > 8. > > + if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) && > > + opts->min_apply_delay > 0 && opts->streaming == > > + opts->LOGICALREP_STREAM_PARALLEL) > > + ereport(ERROR, > > + errcode(ERRCODE_SYNTAX_ERROR), > > > > Saying "> 0" (in the condition) is not strictly necessary here, since it is > > never < 0. > This check is necessary. > > For example, imagine a case when we CREATE a subscription with streaming = on > and then try to ALTER the subscription with streaming = parallel > without any settings for min_apply_delay. The ALTER command > throws an error of "min_apply_delay > 0 and streaming = parallel are > mutually exclusive options." then. > > This is because min_apply_delay is supported by ALTER command > (so the first condition becomes true) and we set > streaming = parallel (which makes the 2nd condition true). > > So, we need to check the opts's actual min_apply_delay value > to make the irrelavent case pass. I think there is some misunderstanding. I was not suggesting removing the condition -- only that I thought it could be written without the > 0 as: if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) && opts->min_apply_delay && opts->streaming == LOGICALREP_STREAM_PARALLEL) ereport(ERROR, > > ~~~ > > > > 9. AlterSubscription > > > > + /* > > + * The combination of parallel streaming mode and > > + * min_apply_delay is not allowed. See > > + * parse_subscription_options for details of the reason. > > + */ > > + if (opts.streaming == LOGICALREP_STREAM_PARALLEL) if > > + ((IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && > > opts.min_apply_delay > 0) || > > + (!IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && > > sub->minapplydelay > 0)) > > > > Saying "> 0" (in the condition) is not strictly necessary here, since it is > > never < 0. > This is also necessary. > > For example, imagine a case that > there is a subscription whose min_apply_delay is 1 day. > Then, you want to try to execute ALTER SUBSCRIPTION > with (min_apply_delay = 0, streaming = parallel). > If we remove the condition of otps.min_apply_delay > 0, > then we error out in this case too. > > First we pass the first condition > of the opts.streaming == LOGICALREP_STREAM_PARALLEL, > since we use streaming option. > Then, we also set min_apply_delay in this example, > then without checking the value of min_apply_delay, > the second condition becomes true > (IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY)). > > So, we need to make this case(min_apply_delay = 0) pass. > Meanwhile, checking the "sub" value is necessary for checking existing > subscription value. I think there is some misunderstanding. I was not suggesting removing the condition -- only that I thought it could be written without the > 0 as:: if (opts.streaming == LOGICALREP_STREAM_PARALLEL) if ((IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && opts.min_apply_delay) || (!IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && sub->minapplydelay)) ereport(ERROR, > > ~~~ > > > > 10. > > + if (IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY)) { > > + /* > > + * The combination of parallel streaming mode and > > + * min_apply_delay is not allowed. > > + */ > > + if (opts.min_apply_delay > 0) > > > > Saying "> 0" (in the condition) is not strictly necessary here, since it is > > never < 0. > This is also required to check the value equals to 0 or not. > Kindly imagine a case when we want to execute ALTER min_apply_delay from 1day > with a pair of (min_apply_delay = 0 and > streaming = parallel). If we remove this check, then this ALTER command fails > with error. Without the check, when we set min_apply_delay > and parallel streaming mode, even when making the min_apply_delay 0, > the error is invoked. > > The check for sub.stream is necessary for existing definition of target > subscription. I think there is some misunderstanding. I was not suggesting removing the condition -- only that I thought it could be written without the > 0 as:: if (opts.min_apply_delay) if ((IsSet(opts.specified_opts, SUBOPT_STREAMING) && opts.streaming == LOGICALREP_STREAM_PARALLEL) || (!IsSet(opts.specified_opts, SUBOPT_STREAMING) && sub->stream == LOGICALREP_STREAM_PARALLEL)) ereport(ERROR, -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu) wrote: > ... > > > > Besides, I am not sure it's a stable test to check the log. Is it possible > > that there's > > no such log on a slow machine? I modified the code to sleep 1s at the > > beginning > > of apply_dispatch(), then the new added test failed because the server log > > cannot match. > To get the log by itself is necessary to ensure > that the delay is conducted by the apply worker, because we emit the diffms > only if it's bigger than 0 in maybe_apply_delay(). If we omit the step, > we are not sure the delay is caused by other reasons or the time-delayed > feature. > > As you mentioned, it's possible that no log is emitted on slow machine. Then, > the idea to make the test safer for such machines should be to make the > delayed time longer. > But we shortened the delay time to 1 second to mitigate the long test > execution time of this TAP test. > So, I'm not sure if it's a good idea to make it longer again. I think there are a couple of things that can be done about this problem: 1. If you need the code/test to remain as-is then at least the test message could include some comforting text like "(this can fail on slow machines when the delay time is already exceeded)" so then a test failure will not cause undue alarm. 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so that it will *always* log the remaining wait time even if that wait time becomes negative. Then I think the test cases can be made deterministic instead of relying on good luck. This seems like the better option. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Support logical replication of DDLs
of the given pg_proc row into the output + * buffer. force_qualify indicates whether to schema-qualify type names + * regardless of visibility. + */ +static void +format_procedure_args_internal(Form_pg_proc procform, StringInfo buf, +bool force_qualify) +{ + int i; + char* (*func[2])(Oid) = {format_type_be, format_type_be_qualified}; + + appendStringInfoChar(buf, '('); + for (i = 0; i < procform->pronargs; i++) + { + Oid thisargtype = procform->proargtypes.values[i]; + char*argtype = NULL; + + if (i > 0) + appendStringInfoChar(buf, ','); + + argtype = func[force_qualify](thisargtype); + appendStringInfoString(buf, argtype); + pfree(argtype); + } + appendStringInfoChar(buf, ')'); +} 9a. Assign argtype = NULL looks redundant because it will always be overwritten anyhow. ~ 9b. I understand why this function was put here beside the other static functions in "Support Routines" but IMO it really belongs nearby (i.e. directly above) the only caller (format_procedure_args). Keeping both those functional together will improve the readability of both, and will also remove the need to have the static forward declaration. == src/backend/utils/adt/ruleutils.c 10. +void +pg_get_ruledef_detailed(Datum ev_qual, Datum ev_action, + char **whereClause, List **actions) +{ + int prettyFlags = 0; + char*qualstr = TextDatumGetCString(ev_qual); + char*actionstr = TextDatumGetCString(ev_action); + List*actionNodeList = (List *) stringToNode(actionstr); + StringInfoData buf; + + *whereClause = NULL; + *actions = NIL; + initStringInfo(); + if (strlen(qualstr) > 0 && strcmp(qualstr, "<>") != 0) + { If you like, that condition could have been written more simply as: if (*qualstr && strcmp(qualstr, "<>") != 0) ~~~ 11. +/* + * Parse back the TriggerWhen clause of a trigger given the pg_trigger record and + * the expression tree (in nodeToString() representation) from pg_trigger.tgqual + * for the trigger's WHEN condition. + */ +char * +pg_get_trigger_whenclause(Form_pg_trigger trigrec, Node *whenClause, bool pretty) +{ It seemed "Parse back" is a typo. I assume it was meant to say something like "Passes back", or maybe just "Returns" is better. == src/include/replication/logicalrelation.h 12. @@ -14,6 +14,7 @@ #include "access/attmap.h" #include "replication/logicalproto.h" +#include "storage/lockdefs.h" What is this needed here for? I tried without this change and everything still builds OK. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Some minor review comments for v91-0001 == doc/src/sgml/config.sgml 1. -Allows streaming or serializing changes immediately in logical decoding. -The allowed values of logical_replication_mode are -buffered and immediate. When set -to immediate, stream each change if +The allowed values are buffered and +immediate. The default is buffered. +This parameter is intended to be used to test logical decoding and +replication of large transactions for which otherwise we need to generate +the changes till logical_decoding_work_mem is +reached. The effect of logical_replication_mode is +different for the publisher and subscriber: + The "for which otherwise..." part is only relevant for the publisher-side. So it seemed slightly strange to give the reason why to use the GUC for one side but not the other side. Maybe we can just to remove that "for which otherwise..." part, since the logical_decoding_work_mem gets mentioned later in the "On the publisher side,..." paragraph anyway. ~~~ 2. -This parameter is intended to be used to test logical decoding and -replication of large transactions for which otherwise we need to -generate the changes till logical_decoding_work_mem -is reached. +On the subscriber side, if the streaming option is set to +parallel, logical_replication_mode +can be used to direct the leader apply worker to send changes to the +shared memory queue or to serialize changes to the file. When set to +buffered, the leader sends changes to parallel apply +workers via a shared memory queue. When set to +immediate, the leader serializes all changes to files +and notifies the parallel apply workers to read and apply them at the +end of the transaction. "or serialize changes to the file." --> "or serialize all changes to files." (just to use same wording as later in this same paragraph, and also same wording as the GUC hint text). -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
time. ~~~ 8. + if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) && + opts->min_apply_delay > 0 && opts->streaming == LOGICALREP_STREAM_PARALLEL) + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), Saying "> 0" (in the condition) is not strictly necessary here, since it is never < 0. ~~~ 9. AlterSubscription + /* + * The combination of parallel streaming mode and + * min_apply_delay is not allowed. See + * parse_subscription_options for details of the reason. + */ + if (opts.streaming == LOGICALREP_STREAM_PARALLEL) + if ((IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && opts.min_apply_delay > 0) || + (!IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && sub->minapplydelay > 0)) Saying "> 0" (in the condition) is not strictly necessary here, since it is never < 0. ~~~ 10. + if (IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY)) + { + /* + * The combination of parallel streaming mode and + * min_apply_delay is not allowed. + */ + if (opts.min_apply_delay > 0) Saying "> 0" (in the condition) is not strictly necessary here, since it is never < 0. ~~~ 11. defGetMinApplyDelay + /* + * Check lower bound. parse_int() has already been confirmed that result + * is less than or equal to INT_MAX. + */ The parse_int already checks < INT_MAX. But on return from that function, don’t you need to check again that it is < PG_INT32_MAX (in case those are different) (I think Kuroda-san already suggested same as this) == src/backend/replication/logical/worker.c 12. +/* + * In order to avoid walsender timeout for time-delayed logical replication the + * apply worker keeps sending feedback messages during the delay period. + * Meanwhile, the feature delays the apply before the start of the + * transaction and thus we don't write WAL records for the suspended changes + * during the wait. When the apply worker sends a feedback message during the + * delay, we should not make positions of the flushed and apply LSN overwritten + * by the last received latest LSN. See send_feedback() for details. + */ "we should not make positions of the flushed and apply LSN overwritten" --> "we should overwrite positions of the flushed and apply LSN" ~~~ 14. send_feedback @@ -3738,8 +3867,15 @@ send_feedback(XLogRecPtr recvpos, bool force, bool requestReply) /* * No outstanding transactions to flush, we can report the latest received * position. This is important for synchronous replication. + * + * If the logical replication subscription has unprocessed changes then do + * not inform the publisher that the received latest LSN is already + * applied and flushed, otherwise, the publisher will make a wrong + * assumption about the logical replication progress. Instead, it just + * sends a feedback message to avoid a replication timeout during the + * delay. */ "Instead, it just sends" --> "Instead, just send" == src/bin/pg_dump/pg_dump.h 15. SubscriptionInfo @@ -661,6 +661,7 @@ typedef struct _SubscriptionInfo char*subdisableonerr; char*suborigin; char*subsynccommit; + int subminapplydelay; char*subpublications; } SubscriptionInfo; Should this also be "int32" to match the other member type changes? == src/test/subscription/t/032_apply_delay.pl 16. +# Make sure the apply worker knows to wait for more than 500ms +check_apply_delay_log($node_subscriber, $offset, "0.5"); "knows to wait for more than" --> "waits for more than" (this occurs in a couple of places) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical replication timeout problem
Here are my review comments for v13-1. == Commit message 1. The DDLs like Refresh Materialized views that generate lots of temporary data due to rewrite rules may not be processed by output plugins (for example pgoutput). So, we won't send keep-alive messages for a long time while processing such commands and that can lead the subscriber side to timeout. ~ SUGGESTION (minor rearranged way to say the same thing) For DDLs that generate lots of temporary data due to rewrite rules (e.g. REFRESH MATERIALIZED VIEW) the output plugins (e.g. pgoutput) may not be processed for a long time. Since we don't send keep-alive messages while processing such commands that can lead the subscriber side to timeout. ~~~ 2. The commit message says what the problem is, but it doesn’t seem to describe what this patch does to fix the problem. == src/backend/replication/logical/reorderbuffer.c 3. + /* + * It is possible that the data is not sent to downstream for a + * long time either because the output plugin filtered it or there + * is a DDL that generates a lot of data that is not processed by + * the plugin. So, in such cases, the downstream can timeout. To + * avoid that we try to send a keepalive message if required. + * Trying to send a keepalive message after every change has some + * overhead, but testing showed there is no noticeable overhead if + * we do it after every ~100 changes. + */ 3a. "data is not sent to downstream" --> "data is not sent downstream" (?) ~ 3b. "So, in such cases," --> "In such cases," ~~~ 4. +#define CHANGES_THRESHOLD 100 + + if (++changes_count >= CHANGES_THRESHOLD) + { + rb->update_progress_txn(rb, txn, change->lsn); + changes_count = 0; + } I was wondering if it would have been simpler to write this code like below. Also, by doing it this way the 'changes_count' variable name makes more sense IMO, otherwise (for current code) maybe it should be called something like 'changes_since_last_keepalive' SUGGESTION if (++changes_count % CHANGES_THRESHOLD == 0) rb->update_progress_txn(rb, txn, change->lsn); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Thanks for the updates to address all of my previous review comments. Patch v90-0001 LGTM. -- Kind Reagrds, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
On Mon, Jan 30, 2023 at 5:23 PM houzj.f...@fujitsu.com wrote: > > On Monday, January 30, 2023 12:13 PM Peter Smith > wrote: > > > > Here are my review comments for v88-0002. > > Thanks for your comments. > > > > > == > > General > > > > 1. > > The test cases are checking the log content but they are not checking for > > debug logs or untranslated elogs -- they are expecting a normal ereport LOG > > that might be translated. I’m not sure if that is OK, or if it is a > > potential problem. > > We have tests that check the ereport ERROR and ereport WARNING message(by > search for the ERROR or WARNING keyword for all the tap tests), so I think > checking the LOG should be fine. > > > == > > doc/src/sgml/config.sgml > > > > 2. > > On the publisher side, logical_replication_mode allows allows streaming or > > serializing changes immediately in logical decoding. When set to immediate, > > stream each change if streaming option (see optional parameters set by > > CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set > > to buffered, the decoding will stream or serialize changes when > > logical_decoding_work_mem is reached. > > > > 2a. > > typo "allows allows" (Kuroda-san reported same) > > > > 2b. > > "if streaming option" --> "if the streaming option" > > Changed. Although you replied "Changed" for the above, AFAICT my review comment #2b. was accidentally missed. Otherwise, the patch LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: pub/sub - specifying optional parameters without values.
On Tue, Jan 31, 2023 at 4:00 AM Tom Lane wrote: > > Peter Smith writes: > > The v3 patch LGTM (just for the logical replication commands). > > Pushed then. > Thanks for pushing the v3 patch. I'd forgotten about the 'streaming' option -- AFAIK this was previously a boolean parameter and so its [= value] part can also be omitted. However, in PG16 streaming became an enum type (on/off/parallel), and the value can still be omitted but that is not really being covered by the new generic text note about booleans added by yesterday's patch. e.g. The enum 'streaming' value part can still be omitted. test_sub=# create subscription sub1 connection 'host=localhost dbname=test_pub' publication pub1 with (streaming); Perhaps a small top-up patch to CREATE SUBSCRIPTION is needed to describe this special case? PSA. (I thought mentioning this special streaming case again for ALTER SUBSCRIPTION might be overkill) -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-Added-note-for-streaming-parameter-with-value-par.patch Description: Binary data
Re: pub/sub - specifying optional parameters without values.
On Mon, Jan 30, 2023 at 8:36 AM Tom Lane wrote: > > Zheng Li writes: > > The behavior is due to the following code > > https://github.com/postgres/postgres/blob/master/src/backend/commands/define.c#L113 > > Yeah, so you can grep for places that have this behavior by looking > for defGetBoolean calls ... and there are quite a few. That leads > me to the conclusion that we'd better invent a fairly stylized > documentation solution that we can plug into a lot of places, > rather than thinking of slightly different ways to say it and > places to say it. I'm not necessarily opposed to Peter's desire > to fix replication-related commands first, but we have more to do > later. > > I'm also not that thrilled with putting the addition up at the top > of the relevant text. This behavior is at least two decades old, > so if we've escaped documenting it at all up to now, it can't be > that important to most people. > > I also notice that ALTER SUBSCRIPTION has fully three different > sub-sections with about equal claims on this note, if we're going > to stick it directly into the affected option lists. > > That all leads me to propose that we add the new text at the end of > the Parameters in the affected man pages. So about > like the attached. (I left out alter_publication.sgml, as I'm not > sure it needs its own copy of this text --- it doesn't describe > individual parameters at all, just refer to CREATE PUBLICATION.) > The v3 patch LGTM (just for the logical replication commands). -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [DOCS] Stats views and functions not in order?
On Fri, Jan 27, 2023 at 10:30 PM Peter Eisentraut wrote: > > On 19.01.23 00:45, Peter Smith wrote: > > The original $SUBJECT requirements evolved to also try to make each > > view appear on a separate page after that was suggested by DavidJ [2]. > > I was unable to achieve per-page views "without radically changing the > > document structure." [3], but DavidJ found a way [4] to do it using > > refentry. I then wrote the patch v8-0003 using that strategy, which > > after more rebasing became the v10-0001 you see today. > > > > I did prefer the view-per-page results (although I also only use HTML > > docs). But my worry is that there seem still to be a few unknowns > > about how this might affect other (not the HTML) renderings of the > > docs. If you think that risk is too great, or if you feel this patch > > will cause unwarranted link/bookmark grief, then I am happy to just > > drop it. > > I'm wary of making semantic markup changes to achieve an ad-hoc > presentation effects. Sometimes it's necessary, but it should be > considered carefully and globally. > > We could change the chunking boundary to be sect2 globally. This is > easily configurable (chunk.section.depth). > > Thinking about it now, maybe this is what we need. As the documentation > grows, as it clearly does, the depth of the structure increases and > pages get longer. This can also be seen in other chapters. > > Of course, this would need to be tested and checked in more detail. > This chunk configuration idea sounds a better approach. If somebody else wants to champion that change separately then I can maybe help to review it. Meanwhile, this pagination topic has strayed far away from the original $SUBJECT, so I guess since there is nothing else pending this thread's CF entry [1] can just be marked as "Committed" now? -- [1] https://commitfest.postgresql.org/41/3904/ Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Here are my review comments for v88-0002. == General 1. The test cases are checking the log content but they are not checking for debug logs or untranslated elogs -- they are expecting a normal ereport LOG that might be translated. I’m not sure if that is OK, or if it is a potential problem. == doc/src/sgml/config.sgml 2. On the publisher side, logical_replication_mode allows allows streaming or serializing changes immediately in logical decoding. When set to immediate, stream each change if streaming option (see optional parameters set by CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set to buffered, the decoding will stream or serialize changes when logical_decoding_work_mem is reached. 2a. typo "allows allows" (Kuroda-san reported same) 2b. "if streaming option" --> "if the streaming option" ~~~ 3. On the subscriber side, if streaming option is set to parallel, logical_replication_mode also allows the leader apply worker to send changes to the shared memory queue or to serialize changes. SUGGESTION On the subscriber side, if the streaming option is set to parallel, logical_replication_mode can be used to direct the leader apply worker to send changes to the shared memory queue or to serialize changes. == src/backend/utils/misc/guc_tables.c 4. { {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS, - gettext_noop("Controls when to replicate each change."), - gettext_noop("On the publisher, it allows streaming or serializing each change in logical decoding."), + gettext_noop("Controls the internal behavior of logical replication publisher and subscriber"), + gettext_noop("On the publisher, it allows streaming or " + "serializing each change in logical decoding. On the " + "subscriber, in parallel streaming mode, it allows " + "the leader apply worker to serialize changes to " + "files and notifies the parallel apply workers to " + "read and apply them at the end of the transaction."), GUC_NOT_IN_SAMPLE }, Suggest re-wording the long description (subscriber part) to be more like the documentation text. BEFORE On the subscriber, in parallel streaming mode, it allows the leader apply worker to serialize changes to files and notifies the parallel apply workers to read and apply them at the end of the transaction. SUGGESTION On the subscriber, if the streaming option is set to parallel, it directs the leader apply worker to send changes to the shared memory queue or to serialize changes and apply them at the end of the transaction. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Patch v88-0001 LGTM. Below are just some minor review comments about the commit message. == Commit message 1. We have discussed having this parameter as a subscription option but exposing a parameter that is primarily used for testing/debugging to users didn't seem advisable and there is no other such parameter. The other option we have discussed is to have a separate GUC for subscriber-side testing but it appears that for the current testing existing parameter is sufficient and avoids adding another GUC. SUGGESTION We discussed exposing this parameter as a subscription option, but it did not seem advisable since it is primarily used for testing/debugging and there is no other such developer option. We also discussed having separate GUCs for publisher/subscriber-side, but for current testing/debugging requirements, one GUC is sufficient. ~~ 2. Reviewed-by: Pater Smith, Kuroda Hayato, Amit Kapila "Pater" --> "Peter" ------ Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Here are my review comments for patch v87-0002. == doc/src/sgml/config.sgml 1. -Allows streaming or serializing changes immediately in logical decoding. The allowed values of logical_replication_mode are -buffered and immediate. When set -to immediate, stream each change if +buffered and immediate. The default +is buffered. + I didn't think it was necessary to say “of logical_replication_mode”. IMO that much is already obvious because this is the first sentence of the description for logical_replication_mode. (see also review comment #4) ~~~ 2. + +On the publisher side, it allows streaming or serializing changes +immediately in logical decoding. When set to +immediate, stream each change if streaming option (see optional parameters set by CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set to -buffered, which is the default, decoding will stream -or serialize changes when logical_decoding_work_mem -is reached. +buffered, decoding will stream or serialize changes +when logical_decoding_work_mem is reached. 2a. "it allows" --> "logical_replication_mode allows" 2b. "decoding" --> "the decoding" ~~~ 3. + +On the subscriber side, if streaming option is set +to parallel, this parameter also allows the leader +apply worker to send changes to the shared memory queue or to serialize +changes. When set to buffered, the leader sends +changes to parallel apply workers via shared memory queue. When set to +immediate, the leader serializes all changes to +files and notifies the parallel apply workers to read and apply them at +the end of the transaction. + "this parameter also allows" --> "logical_replication_mode also allows" ~~~ 4. This parameter is intended to be used to test logical decoding and replication of large transactions for which otherwise we need to generate the changes till logical_decoding_work_mem -is reached. +is reached. Moreover, this can also be used to test the transmission of +changes between the leader and parallel apply workers. "Moreover, this can also" --> "It can also" I am wondering would this sentence be better put at the top of the GUC description. So then the first paragraph becomes like this: SUGGESTION (I've also added another sentence "The effect of...") The allowed values are buffered and immediate. The default is buffered. This parameter is intended to be used to test logical decoding and replication of large transactions for which otherwise we need to generate the changes till logical_decoding_work_mem is reached. It can also be used to test the transmission of changes between the leader and parallel apply workers. The effect of logical_replication_mode is different for the publisher and subscriber: On the publisher side... On the subscriber side... == .../replication/logical/applyparallelworker.c 5. + /* + * In immeidate mode, directly return false so that we can switch to + * PARTIAL_SERIALIZE mode and serialize remaining changes to files. + */ + if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE) + return false; Typo "immediate" Also, I felt "directly" is not needed. "return false" and "directly return false" is the same. SUGGESTION Using ‘immediate’ mode returns false to cause a switch to PARTIAL_SERIALIZE mode so that the remaining changes will be serialized. == src/backend/utils/misc/guc_tables.c 6. { {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS, - gettext_noop("Allows streaming or serializing each change in logical decoding."), - NULL, + gettext_noop("Controls the behavior of logical replication publisher and subscriber"), + gettext_noop("If set to immediate, on the publisher side, it " + "allows streaming or serializing each change in " + "logical decoding. On the subscriber side, in " + "parallel streaming mode, it allows the leader apply " + "worker to serialize changes to files and notifies " + "the parallel apply workers to read and apply them at " + "the end of the transaction."), GUC_NOT_IN_SAMPLE }, 6a. short description User PoV behaviour should be the same. Instead, maybe say "controls the internal behavior" or something like that? ~ 6b. long description IMO the long description shouldn’t mention ‘immediate’ mode first as it does. BEFORE If set to immediate, on the publisher side, ... AFTER On the publisher side, ... -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Jan 24, 2023 at 11:49 PM houzj.f...@fujitsu.com wrote: > ... > > Sorry, the patch set was somehow attached twice. Here is the correct new > version > patch set which addressed all comments so far. > Here are my review comments for patch v87-0001. == src/backend/replication/logical/reorderbuffer.c 1. @@ -210,7 +210,7 @@ int logical_decoding_work_mem; static const Size max_changes_in_memory = 4096; /* XXX for restore only */ /* GUC variable */ -int logical_decoding_mode = LOGICAL_DECODING_MODE_BUFFERED; +int logical_replication_mode = LOGICAL_REP_MODE_BUFFERED; I noticed that the comment /* GUC variable */ is currently only above the logical_replication_mode, but actually logical_decoding_work_mem is a GUC variable too. Maybe this should be rearranged somehow then change the comment "GUC variable" -> "GUC variables"? == src/backend/utils/misc/guc_tables.c @@ -4908,13 +4908,13 @@ struct config_enum ConfigureNamesEnum[] = }, { - {"logical_decoding_mode", PGC_USERSET, DEVELOPER_OPTIONS, + {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS, gettext_noop("Allows streaming or serializing each change in logical decoding."), NULL, GUC_NOT_IN_SAMPLE }, - _decoding_mode, - LOGICAL_DECODING_MODE_BUFFERED, logical_decoding_mode_options, + _replication_mode, + LOGICAL_REP_MODE_BUFFERED, logical_replication_mode_options, NULL, NULL, NULL }, That gettext_noop string seems incorrect. I think Kuroda-san previously reported the same, but then you replied it has been fixed already [1] > I felt the description seems not to be suitable for current behavior. > A short description should be like "Sets a behavior of logical replication", > and > further descriptions can be added in lond description. I adjusted the description here. But this doesn't look fixed to me. (??) == src/include/replication/reorderbuffer.h 3. @@ -18,14 +18,14 @@ #include "utils/timestamp.h" extern PGDLLIMPORT int logical_decoding_work_mem; -extern PGDLLIMPORT int logical_decoding_mode; +extern PGDLLIMPORT int logical_replication_mode; Probably here should also be a comment to say "/* GUC variables */" -- [1] https://www.postgresql.org/message-id/OS0PR01MB5716AE9F095F9E7888987BC794C99%40OS0PR01MB5716.jpnprd01.prod.outlook.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Here are some review comments for v86-0002 == Commit message 1. Use the use the existing developer option logical_replication_mode to test the parallel apply of large transaction on subscriber. ~ Typo “Use the use the” SUGGESTION (rewritten) Give additional functionality to the existing developer option 'logical_replication_mode' to help test parallel apply of large transactions on the subscriber. ~~~ 2. Maybe that commit message should also say extra TAP tests that have been added to exercise the serialization part of the parallel apply? BTW – I can see the TAP tests are testing full serialization (when the GUC is 'immediate') but I not sure how is "partial" serialization (when it has to switch halfway from shmem to files) being tested. == doc/src/sgml/config.sgml 3. Allows streaming or serializing changes immediately in logical decoding. The allowed values of logical_replication_mode are buffered and immediate. When set to immediate, stream each change if streaming option (see optional parameters set by CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set to buffered, which is the default, decoding will stream or serialize changes when logical_decoding_work_mem is reached. On the subscriber side, if streaming option is set to parallel, this parameter also allows the leader apply worker to send changes to the shared memory queue or to serialize changes. When set to buffered, the leader sends changes to parallel apply workers via shared memory queue. When set to immediate, the leader serializes all changes to files and notifies the parallel apply workers to read and apply them at the end of the transaction. ~ Because now this same developer GUC affects both the publisher side and the subscriber side differently IMO this whole description should be re-structured accordingly. SUGGESTION (something like) The allowed values of logical_replication_mode are buffered and immediate. The default is buffered. On the publisher side, ... On the subscriber side, ... ~~~ 4. This parameter is intended to be used to test logical decoding and replication of large transactions for which otherwise we need to generate the changes till logical_decoding_work_mem is reached. ~ Maybe this paragraph needs rewording or moving. e.g. Isn't that misleading now? Although this might be an explanation for the publisher side, it does not seem relevant to the subscriber side's behaviour. == .../replication/logical/applyparallelworker.c 5. @ -1149,6 +1149,9 @@ pa_send_data(ParallelApplyWorkerInfo *winfo, Size nbytes, const void *data) Assert(!IsTransactionState()); Assert(!winfo->serialize_changes); + if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE) + return false; + I felt that code should have some comment, even if it is just something quite basic like "/* For developer testing */" == .../t/018_stream_subxact_abort.pl 6. +# Clean up test data from the environment. +$node_publisher->safe_psql('postgres', "TRUNCATE TABLE test_tab_2"); +$node_publisher->wait_for_catchup($appname); Is it necessary to TRUNCATE the table here? If everything is working shouldn't the data be rolled back anyway? ~~~ 7. +$node_publisher->safe_psql( + 'postgres', q{ + BEGIN; + INSERT INTO test_tab_2 values(1); + SAVEPOINT sp; + INSERT INTO test_tab_2 values(1); + ROLLBACK TO sp; + COMMIT; + }); Perhaps this should insert 2 different values so then the verification code can check the correct value remains instead of just checking COUNT(*)? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Tue, Jan 24, 2023 at 5:58 PM Amit Kapila wrote: > > On Tue, Jan 24, 2023 at 8:15 AM Kyotaro Horiguchi > wrote: > > > > > Attached the updated patch v19. > > > > + maybe_delay_apply(TransactionId xid, TimestampTz finish_ts) > > > > I look this spelling strange. How about maybe_apply_delay()? > > > > +1. It depends on how you read it. I read it like this: maybe_delay_apply === means "maybe delay [the] apply" (which is exactly what the function does) versus maybe_apply_delay === means "maybe [the] apply [needs a] delay" (which is also correct, but it seemed a more awkward way to say it IMO) ~ Perhaps it's better to rename it more fully like *maybe_delay_the_apply* to remove any ambiguous interpretations. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical replication timeout problem
On Tue, Jan 24, 2023 at 1:45 PM wangw.f...@fujitsu.com wrote: > > On Tues, Jan 24, 2023 at 8:28 AM Peter Smith wrote: > > Hi Hou-san, Here are my review comments for v5-0001. > > Thanks for your comments. ... > > Changed as suggested. > > Attach the new patch. Thanks! Patch v6 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Here are my review comments for patch v86-0001. == General 1. IIUC the GUC name was made generic 'logical_replication_mode' so that multiple developer GUCs are not needed later. But IMO those current option values (buffered/immediate) for that GUC are maybe a bit too generic. Perhaps in future, we might want more granular control than that allows. e.g. I can imagine there might be multiple different meanings for what "buffered" means. If there is any chance of the generic values being problematic later then maybe they should be made more specific up-front. e.g. maybe like: logical_replication_mode = buffered_decoding logical_replication_mode = immediate_decoding Thoughts? == Commit message 2. Since we may extend the developer option logical_decoding_mode to to test the parallel apply of large transaction on subscriber, rename this option to logical_replication_mode to make it easier to understand. ~ 2a typo "to to" typo "large transaction on subscriber" --> "large transactions on the subscriber" ~ 2b. IMO better to rephrase the whole paragraph like shown below. SUGGESTION Rename the developer option 'logical_decoding_mode' to the more flexible name 'logical_replication_mode' because doing so will make it easier to extend this option in future to help test other areas of logical replication. == doc/src/sgml/config.sgml 3. Allows streaming or serializing changes immediately in logical decoding. The allowed values of logical_replication_mode are buffered and immediate. When set to immediate, stream each change if streaming option (see optional parameters set by CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set to buffered, which is the default, decoding will stream or serialize changes when logical_decoding_work_mem is reached. ~ IMO it's more clear to say the default when the options are first mentioned. So I suggested removing the "which is the default" part, and instead saying: BEFORE The allowed values of logical_replication_mode are buffered and immediate. AFTER The allowed values of logical_replication_mode are buffered and immediate. The default is buffered. == src/backend/utils/misc/guc_tables.c 4. @@ -396,8 +396,8 @@ static const struct config_enum_entry ssl_protocol_versions_info[] = { }; static const struct config_enum_entry logical_decoding_mode_options[] = { - {"buffered", LOGICAL_DECODING_MODE_BUFFERED, false}, - {"immediate", LOGICAL_DECODING_MODE_IMMEDIATE, false}, + {"buffered", LOGICAL_REP_MODE_BUFFERED, false}, + {"immediate", LOGICAL_REP_MODE_IMMEDIATE, false}, {NULL, 0, false} }; I noticed this array is still called "logical_decoding_mode_options". Was that deliberate? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical replication timeout problem
Hi Hou-san, Here are my review comments for v5-0001. == src/backend/replication/logical/reorderbuffer.c 1. @@ -2446,6 +2452,23 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, elog(ERROR, "tuplecid value in changequeue"); break; } + + /* + * Sending keepalive messages after every change has some overhead, but + * testing showed there is no noticeable overhead if keepalive is only + * sent after every ~100 changes. + */ +#define CHANGES_THRESHOLD 100 + + /* + * Try to send a keepalive message after every CHANGES_THRESHOLD + * changes. + */ + if (++changes_count >= CHANGES_THRESHOLD) + { + rb->update_progress_txn(rb, txn, change); + changes_count = 0; + } I noticed you put the #define adjacent to the only usage of it, instead of with the other variable declaration like it was before. Probably it is better how you have done it, but: 1a. The comment indentation is incorrect. ~ 1b. Since the #define is adjacent to its only usage IMO now the 2nd comment is redundant. So the code can just say /* * Sending keepalive messages after every change has some overhead, but * testing showed there is no noticeable overhead if keepalive is only * sent after every ~100 changes. */ #define CHANGES_THRESHOLD 100 if (++changes_count >= CHANGES_THRESHOLD) { rb->update_progress_txn(rb, txn, change); changes_count = 0; } -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Mon, Jan 23, 2023 at 9:44 PM Amit Kapila wrote: > > On Mon, Jan 23, 2023 at 1:36 PM Peter Smith wrote: > > > > Here are my review comments for v19-0001. > > > ... > > > > 5. parse_subscription_options > > > > + /* > > + * The combination of parallel streaming mode and min_apply_delay is not > > + * allowed. The subscriber in the parallel streaming mode applies each > > + * stream on arrival without the time of commit/prepare. So, the > > + * subscriber needs to depend on the arrival time of the stream in this > > + * case, if we apply the time-delayed feature for such transactions. Then > > + * there is a possibility where some unnecessary delay will be added on > > + * the subscriber by network communication break between nodes or other > > + * heavy work load on the publisher. On the other hand, applying the delay > > + * at the end of transaction with parallel apply also can cause issues of > > + * used resource bloat and locks kept in open for a long time. Thus, those > > + * features can't work together. > > + */ > > > > IMO some re-wording might be warranted here. I am not sure quite how > > to do it. Perhaps like below? > > > > SUGGESTION > > > > The combination of parallel streaming mode and min_apply_delay is not > > allowed. > > > > Here are some reasons why these features are incompatible: > > a. In the parallel streaming mode the subscriber applies each stream > > on arrival without knowledge of the commit/prepare time. This means we > > cannot calculate the underlying network/decoding lag between publisher > > and subscriber, and so always waiting for the full 'min_apply_delay' > > period might include unnecessary delay. > > b. If we apply the delay at the end of the transaction of the parallel > > apply then that would cause issues related to resource bloat and locks > > being held for a long time. > > > > ~~~ > > > > How about something like: > The combination of parallel streaming mode and min_apply_delay is not > allowed. This is because we start applying the transaction stream as > soon as the first change arrives without knowing the transaction's > prepare/commit time. This means we cannot calculate the underlying > network/decoding lag between publisher and subscriber, and so always > waiting for the full 'min_apply_delay' period might include > unnecessary delay. > > The other possibility is to apply the delay at the end of the parallel > apply transaction but that would cause issues related to resource > bloat and locks being held for a long time. > +1. That's better. > > > 6. defGetMinApplyDelay > > ... > > > > 6b. > > I thought this function should be implemented as static and located at > > the top of the subscriptioncmds.c source file. > > > > I agree that this should be a static function but I think its current > location is a better place as other similar function is just above it. > But, why not do everything, instead of settling on a half-fix? e.g. 1. Change the new function (defGetMinApplyDelay) to be static as it should be 2. And move defGetMinApplyDelay to the top of the file where IMO it really belongs 3. And then remove the (now) redundant forward declaration of defGetMinApplyDelay 4. And also move the existing function (defGetStreamingMode) to the top of the file so that those similar functions (defGetMinApplyDelay and defGetStreamingMode) can remain together -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
rce file. == src/backend/replication/logical/worker.c 7. maybe_delay_apply +static void maybe_delay_apply(TransactionId xid, TimestampTz finish_ts); Is there a reason why this is here? AFAIK the static implementation precedes any usage so I doubt this forward declaration is required. ~~~ 8. send_feedback @@ -3775,11 +3912,12 @@ send_feedback(XLogRecPtr recvpos, bool force, bool requestReply) pq_sendint64(reply_message, now); /* sendTime */ pq_sendbyte(reply_message, requestReply); /* replyRequested */ - elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X", + elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X in-delayed: %d", force, LSN_FORMAT_ARGS(recvpos), LSN_FORMAT_ARGS(writepos), - LSN_FORMAT_ARGS(flushpos)); + LSN_FORMAT_ARGS(flushpos), + in_delayed_apply); Wondering if it is better to write this as: "sending feedback (force %d, in_delayed_apply %d) to recv %X/%X, write %X/%X, flush %X/%X" == src/test/regress/sql/subscription.sql 9. Add new test? Should there be an additional test to check redundant parameter setting -- eg. "... WITH (min_apply_delay=123, min_apply_delay=456)" (this is related to the review comment #4) ~ 10. Add new tests? Should there be other tests just to verify different units (like 'd', 'h', 'min') are working OK? == src/test/subscription/t/032_apply_delay.pl 11. +# Confirm the time-delayed replication has been effective from the server log +# message where the apply worker emits for applying delay. Moreover, verifies +# that the current worker's delayed time is sufficiently bigger than the +# expected value, in order to check any update of the min_apply_delay. +sub check_apply_delay_log "the current worker's delayed time..." --> "the current worker's remaining wait time..." ?? ~~~ 12. + # Get the delay time from the server log + my $contents = slurp_file($node_subscriber->logfile, $offset); "Get the delay time" --> "Get the remaining wait time..." ~~~ 13. +# Create a subscription that applies the trasaction after 50 milliseconds delay +$node_subscriber->safe_psql('postgres', + "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (copy_data = off, min_apply_delay = '50ms', streaming = 'on')" +); 13a. typo: "trasaction" ~ 13b 50ms seems an extremely short time – How do you even know if this is testing anything related to the time delay? You may just be detecting the normal lag between publisher and subscriber without time delay having much to do with anything. ~ 14. +# Note that we cannot call check_apply_delay_log() here because there is a +# possibility that the delay is skipped. The event happens when the WAL +# replication between publisher and subscriber is delayed due to a mechanical +# problem. The log output will be checked later - substantial delay-time case. + +# Verify that the subscriber lags the publisher by at least 50 milliseconds +check_apply_delay_time($node_publisher, $node_subscriber, '2', '0.05'); 14a. "The event happens..." ?? Did you mean "This might happen if the WAL..." ~ 14b. The log output will be checked later - substantial delay-time case. I think that needs re-wording to clarify. e.g1. you have nothing called a "substantial delay-time" case. e.g2. the word "later" confused me. Originally, I thought you meant it is not tested yet but that you will check it "later", but now IIUC you are just referring to the "1 day 5 minutes" test that comes below in this location TAP file (??) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical replication timeout problem
Here are my review comments for patch v4-0001 == General 1. It makes no real difference, but I was wondering about: "update txn progress" versus "update progress txn" I thought that the first way sounds more natural. YMMV. If you change this then there is impact for the typedef, function names, comments, member names: ReorderBufferUpdateTxnProgressCB --> ReorderBufferUpdateProgressTxnCB “/* update progress txn callback */” --> “/* update txn progress callback */” update_progress_txn_cb_wrapper --> update_txn_progress_cb_wrapper updated_progress_txn --> update_txn_progress == Commit message 2. The problem is when there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, these temporary data will not be processed by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts caused by filtering out changes in pgoutput. Therefore, the previous fix for DML had no impact on this case. ~ IMO this still some rewording to say up-front what the the actual problem -- i.e. an avoidable timeout occuring. SUGGESTION (or something like this...) When there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, this temporary data will not be processed by the pgoutput plugin. This means it is possible for a timeout to occur if a sufficiently long time elapses since the last pgoutput message. A previous commit (f95d53e) fixed a similar scenario in this area, but that only fixed timeouts for DML going through pgoutput, so it did not address this DDL timeout case. == src/backend/replication/logical/logical.c 3. update_progress_txn_cb_wrapper +/* + * Update progress callback while processing a transaction. + * + * Try to update progress and send a keepalive message during sending data of a + * transaction (and its subtransactions) to the output plugin. + * + * For a large transaction, if we don't send any change to the downstream for a + * long time (exceeds the wal_receiver_timeout of standby) then it can timeout. + * This can happen when all or most of the changes are either not published or + * got filtered out. + */ +static void +update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, +ReorderBufferChange *change) Simplify the "Try to..." paragraph. And other part should also mention about DDL. SUGGESTION Try send a keepalive message during transaction processing. This is done because if we don't send any change to the downstream for a long time (exceeds the wal_receiver_timeout of standby), then it can timeout. This can happen for large DDL, or for large transactions when all or most of the changes are either not published or got filtered out. == .../replication/logical/reorderbuffer.c 4. ReorderBufferProcessTXN @@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, PG_TRY(); { + /* + * Static variable used to accumulate the number of changes while + * processing txn. + */ + static int changes_count = 0; + + /* + * Sending keepalive messages after every change has some overhead, but + * testing showed there is no noticeable overhead if keepalive is only + * sent after every ~100 changes. + */ +#define CHANGES_THRESHOLD 100 + IMO these can be relocated to be declared/defined inside the "while" loop -- i.e. closer to where they are being used. ~~~ 5. + if (++changes_count >= CHANGES_THRESHOLD) + { + rb->update_progress_txn(rb, txn, change); + changes_count = 0; + } When there is no update_progress function this code is still incurring some small additional overhead for incrementing and testing the THRESHOLD every time, and also needlessly calling to the wrapper every 100x. This overhead could be avoided with a simpler up-front check like shown below. OTOH, maybe the overhead is insignificant enough that just leaving the curent code is neater? LogicalDecodingContext *ctx = rb->private_data; ... if (ctx->update_progress_txn && (++changes_count >= CHANGES_THRESHOLD)) { rb->update_progress_txn(rb, txn, change); changes_count = 0; } -- Kind Reagrds, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Fri, Jan 20, 2023 at 2:47 PM shveta malik wrote: > ... > 2) > Logging: > 2023-01-19 17:33:16.202 IST [404797] DEBUG: logical replication apply > delay: 19979 ms > 2023-01-19 17:33:26.212 IST [404797] DEBUG: logical replication apply > delay: 9969 ms > 2023-01-19 17:34:25.730 IST [404962] DEBUG: logical replication apply > delay: 179988 ms-->previous wait over, started for next txn > 2023-01-19 17:34:35.737 IST [404962] DEBUG: logical replication apply > delay: 169981 ms > 2023-01-19 17:34:45.746 IST [404962] DEBUG: logical replication apply > delay: 159972 ms > > Is there a way to distinguish between these logs? Maybe dumping xids > along-with? > +1 Also, I was thinking of some other logging enhancements a) the message should say that this is the *remaining* time to left to wait. b) it might be convenient to know from the log what was the original min_apply_delay value in the 1st place. For example, the logs might look something like this: DEBUG: time-delayed replication for txid 1234, min_apply_delay = 16 ms. Remaining wait time: 159972 ms DEBUG: time-delayed replication for txid 1234, min_apply_delay = 16 ms. Remaining wait time: 142828 ms DEBUG: time-delayed replication for txid 1234, min_apply_delay = 16 ms. Remaining wait time: 129994 ms DEBUG: time-delayed replication for txid 1234, min_apply_delay = 16 ms. Remaining wait time: 110001 ms ... -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
apply(TimestampTz finish_ts) That last sentence "Hence,... delay at the time" does not sound correct. Is there a typo or missing words here? Maybe it meant to say "... at the STREAM START time."? ~~~ 9. + /* This might change wal_receiver_status_interval */ + if (ConfigReloadPending) + { + ConfigReloadPending = false; + ProcessConfigFile(PGC_SIGHUP); + } I was unsure why did you make a special mention of 'wal_receiver_status_interval' here. I mean, Aren't there also other GUCs that might change and affect something here so was there some special reason only this one was mentioned? == src/test/subscription/t/032_apply_delay.pl 10. + +# Compare inserted time on the publisher with applied time on the subscriber to +# confirm the latter is applied after expected time. +sub check_apply_delay_time Maybe the comment could also mention that the time is automatically stored in the table column 'c'. ~~~ 11. +# Confirm the suspended record doesn't get applied expectedly by the ALTER +# DISABLE command. +$result = $node_subscriber->safe_psql('postgres', + "SELECT count(a) FROM test_tab WHERE a = 0;"); +is($result, qq(0), "check if the delayed transaction doesn't get applied expectedly"); The use of "doesn't get applied expectedly" (in 2 places here) seemed strange. Maybe it's better to say like SUGGESTION # Confirm disabling the subscription by ALTER DISABLE did not cause the delayed transaction to be applied. $result = $node_subscriber->safe_psql('postgres', "SELECT count(a) FROM test_tab WHERE a = 0;"); is($result, qq(0), "check the delayed transaction was not applied"); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical replication timeout problem
On Fri, Jan 20, 2023 at 3:35 PM Amit Kapila wrote: > > On Fri, Jan 20, 2023 at 7:40 AM Peter Smith wrote: > > > > Here are some review comments for patch v3-0001. > > > > == > > src/backend/replication/logical/logical.c > > > > 3. forward declaration > > > > +/* update progress callback */ > > +static void update_progress_cb_wrapper(ReorderBuffer *cache, > > +ReorderBufferTXN *txn, > > +ReorderBufferChange *change); > > > > I felt this function wrapper name was a bit misleading... AFAIK every > > other wrapper really does just wrap their respective functions. But > > this one seems a bit different because it calls the wrapped function > > ONLY if some threshold is exceeded. IMO maybe this function could have > > some name that conveys this better: > > > > e.g. update_progress_cb_wrapper_with_threshold > > > > I am wondering whether it would be better to move the threshold logic > to the caller. Previously this logic was inside the function because > it was being invoked from multiple places but now that won't be the > case. Also, then your concern about the name would also be addressed. > > > > > ~ > > > > 7b. > > Would it be neater to just call OutputPluginUpdateProgress here instead? > > > > e.g. > > BEFORE > > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); > > AFTER > > OutputPluginUpdateProgress(ctx, false); > > > > We already check whether ctx->update_progress is defined or not which > is the only extra job done by OutputPluginUpdateProgress but probably > we can consolidate the checks and directly invoke > OutputPluginUpdateProgress. > Yes, I saw that, but I thought it was better to keep the early exit from update_progress_cb_wrapper, so incurring just one additional boolean check for every 100 changes was not anything to worry about. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Logical replication timeout problem
Here are some review comments for patch v3-0001. == Commit message 1. The problem is when there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, these temporary data will not be processed by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no impact on this case. ~ 1a. IMO this comment needs to give a bit of background about the original problem here, rather than just starting with "The problem is" which is describing the flaws of the previous fix. ~ 1b. "pgoutput - plugin" -> "pgoutput plugin" ?? ~~~ 2. To fix this, we introduced a new ReorderBuffer callback - 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the process after each change has been processed during sending data of a transaction (and its subtransactions) to the output plugin. IIUC it's not really "after each change" - shouldn't this comment mention something about the CHANGES_THRESHOLD 100? == src/backend/replication/logical/logical.c 3. forward declaration +/* update progress callback */ +static void update_progress_cb_wrapper(ReorderBuffer *cache, +ReorderBufferTXN *txn, +ReorderBufferChange *change); I felt this function wrapper name was a bit misleading... AFAIK every other wrapper really does just wrap their respective functions. But this one seems a bit different because it calls the wrapped function ONLY if some threshold is exceeded. IMO maybe this function could have some name that conveys this better: e.g. update_progress_cb_wrapper_with_threshold ~~~ 4. update_progress_cb_wrapper +/* + * Update progress callback + * + * Try to update progress and send a keepalive message if too many changes were + * processed when processing txn. + * + * For a large transaction, if we don't send any change to the downstream for a + * long time (exceeds the wal_receiver_timeout of standby) then it can timeout. + * This can happen when all or most of the changes are either not published or + * got filtered out. + */ SUGGESTION (instead of the "Try to update" sentence) Send a keepalive message whenever more than changes are encountered while processing a transaction. ~~~ 5. +static void +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, +ReorderBufferChange *change) +{ + LogicalDecodingContext *ctx = cache->private_data; + LogicalErrorCallbackState state; + ErrorContextCallback errcallback; + static int changes_count = 0; /* Static variable used to accumulate + * the number of changes while + * processing txn. */ + IMO this may be more readable if the static 'changes_count' local var was declared first and separated from the other vars by a blank line. ~~~ 6. + /* + * We don't want to try sending a keepalive message after processing each + * change as that can have overhead. Tests revealed that there is no + * noticeable overhead in doing it after continuously processing 100 or so + * changes. + */ +#define CHANGES_THRESHOLD 100 6a. I think it might be better to define this right at the top of the function adjacent to the 'changes_count' variable (e.g. a bit like the original HEAD code looked) ~ 6b. SUGGESTION (for the comment) Sending keepalive messages after every change has some overhead, but testing showed there is no noticeable overhead if keepalive is only sent after every ~100 changes. ~~~ 7. + + /* + * After continuously processing CHANGES_THRESHOLD changes, we + * try to send a keepalive message if required. + */ + if (++changes_count >= CHANGES_THRESHOLD) + { + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); + changes_count = 0; + } + 7a. SUGGESTION (for comment) Send a keepalive message after every CHANGES_THRESHOLD changes. ~ 7b. Would it be neater to just call OutputPluginUpdateProgress here instead? e.g. BEFORE ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); AFTER OutputPluginUpdateProgress(ctx, false); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Use appendStringInfoSpaces more
On Thu, Jan 19, 2023 at 8:45 PM David Rowley wrote: > > In [1] I noticed a bit of a poor usage of appendStringInfoString which > just appends 4 spaces in a loop, one for each indent level of the > jsonb. It should be better just to use appendStringInfoSpaces and > just append all the spaces in one go rather than appending 4 spaces in > a loop. That'll save having to check enlargeStringInfo() once for each > loop. > Should the add_indent function also have a check to avoid making unnecessary calls to appendStringInfoSpaces when the level is 0? e.g. if (indent) { appendStringInfoCharMacro(out, '\n'); if (level > 0) appendStringInfoSpaces(out, level * 4); } V. if (indent) { appendStringInfoCharMacro(out, '\n'); appendStringInfoSpaces(out, level * 4); } -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Deduplicate logicalrep_read_tuple()
On Wed, Jan 18, 2023 at 6:26 PM Bharath Rupireddy wrote: > > Hi, > > logicalrep_read_tuple() duplicates code for LOGICALREP_COLUMN_TEXT and > LOGICALREP_COLUMN_BINARY introduced by commit 9de77b5. While it > doesn't hurt anyone, deduplication makes code a bit leaner by 57 bytes > [1]. I've attached a patch for $SUBJECT. > > Thoughts? > The code looks the same but there is a subtle comment difference where previously only LOGICALREP_COLUMN_BINARY case said: /* not strictly necessary but per StringInfo practice */ So if you de-duplicate the code then should that comment be modified to say /* not strictly necessary for LOGICALREP_COLUMN_BINARY but per StringInfo practice */ -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Wed, Jan 18, 2023 at 6:06 PM Peter Smith wrote: > > Here are my review comments for the latest patch v16-0001. (excluding > the test code) > And here are some review comments for the v16-0001 test code. == src/test/regress/sql/subscription.sql 1. General For all comments "time delayed replication" -> "time-delayed replication" maybe is better? ~~~ 2. -- fail - utilizing streaming = parallel with time delayed replication is not supported. For readability please put a blank line before this test. ~~~ 3. -- success -- value without unit is taken as milliseconds "value" -> "min_apply_delay value" ~~~ 4. -- success -- interval is converted into ms and stored as integer "interval" -> "min_apply_delay interval" "integer" -> "an integer" ~~~ 5. You could also add another test where min_apply_delay is 0 Then the following combination can be confirmed OK -- success create subscription with (streaming=parallel, min_apply_delay=0) ~~ 6. -- fail - alter subscription with min_apply_delay should fail when streaming = parallel is set. CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = parallel); There is another way to do this test without creating a brand-new subscription. You could just alter the existing subscription like: ALTER ... SET (min_apply_delay = 0) then ALTER ... SET (parallel = streaming) then ALTER ... SET (min_apply_delay = 123) == src/test/subscription/t/032_apply_delay.pl 7. sub check_apply_delay_log my ($node_subscriber, $message, $expected) = @_; Why pass in the message text? I is always the same so can be hardwired in this function, right? ~~~ 8. # Get the delay time in the server log "int the server log" -> "from the server log" (?) ~~~ 9. qr/$message: (\d+) ms/ or die "could not get delayed time"; my $logged_delay = $1; # Is it larger than expected? cmp_ok($logged_delay, '>', $expected, "The wait time of the apply worker is long enough expectedly" ); 9a. "could not get delayed time" -> "could not get the apply worker wait time" 9b. "The wait time of the apply worker is long enough expectedly" -> "The apply worker wait time has expected duration" ~~~ 10. sub check_apply_delay_time Maybe a brief explanatory comment for this function is needed to explain the unreplicated column c. ~~~ 11. $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = on, min_apply_delay = '3s')" I think there should be a comment here highlighting that you are setting up a subscriber time delay of 3 seconds, and then later you can better describe the parameters for the checking functions... e.g. (add this comment) # verifies that the subscriber lags the publisher by at least 3 seconds check_apply_delay_time($node_publisher, $node_subscriber, '5', '3'); e.g. # verifies that the subscriber lags the publisher by at least 3 seconds check_apply_delay_time($node_publisher, $node_subscriber, '8', '3'); ~~~ 12. # Test whether ALTER SUBSCRIPTION changes the delayed time of the apply worker # (1 day 1 minute). $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub SET (min_apply_delay = 8646)" ); Update the comment with another note. # Note - The extra 1 min is to account for any decoding/network overhead. ~~~ 13. # Make sure we have long enough min_apply_delay after the ALTER command check_apply_delay_log($node_subscriber, "logical replication apply delay", "8000"); IMO the expectation of 1 day (8646 ms) wait time might be a better number for your "expected" value. So update the comment/call like this: # Make sure the apply worker knows to wait for more than 1 day (8640 ms) check_apply_delay_log($node_subscriber, "logical replication apply delay", "8640"); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Wed, Jan 18, 2023 at 6:06 PM Peter Smith wrote: > > Here are my review comments for the latest patch v16-0001. (excluding > the test code) > ... > > 8. AlterSubscription (general) > > I observed during testing there are 3 different errors…. > > At subscription CREATE time you can get this error: > ERROR: min_apply_delay > 0 and streaming = parallel are mutually > exclusive options > > If you try to ALTER the min_apply_delay when already streaming = > parallel you can get this error: > ERROR: cannot enable min_apply_delay for subscription in streaming = > parallel mode > > If you try to ALTER the streaming to be parallel if there is already a > min_apply_delay > 0 then you can get this error: > ERROR: cannot enable streaming = parallel mode for subscription with > min_apply_delay > > ~ > > IMO there is no need to have 3 different error message texts. I think > all these cases are explained by just the first text (ERROR: > min_apply_delay > 0 and streaming = parallel are mutually exclusive > options) > > After checking the regression test output I can see the merit of your separate error messages like this, even if they are maybe not strictly necessary. So feel free to ignore my previous review comment. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [DOCS] Stats views and functions not in order?
On Thu, Jan 19, 2023 at 2:55 AM David G. Johnston wrote: > > On Wed, Jan 18, 2023 at 8:38 AM Tom Lane wrote: >> >> "David G. Johnston" writes: >> > ... I was going for the html effect >> > of having these views chunked into their own pages, any other changes being >> > non-detrimental. >> >> But is that a result we want? It will for example break any bookmarks >> that people might have for these documentation entries. It will also >> pretty thoroughly break the cross-version navigation links in this >> part of the docs. >> >> >> Maybe the benefit is worth those costs, but I'm entirely not convinced >> of that. I think we need to tread pretty lightly when rearranging >> longstanding documentation-layout decisions. >> > David already gave a good summary [1], but since I was the OP here is the background of v10-0001 from my PoV. ~ The original $SUBJECT requirements evolved to also try to make each view appear on a separate page after that was suggested by DavidJ [2]. I was unable to achieve per-page views "without radically changing the document structure." [3], but DavidJ found a way [4] to do it using refentry. I then wrote the patch v8-0003 using that strategy, which after more rebasing became the v10-0001 you see today. I did prefer the view-per-page results (although I also only use HTML docs). But my worry is that there seem still to be a few unknowns about how this might affect other (not the HTML) renderings of the docs. If you think that risk is too great, or if you feel this patch will cause unwarranted link/bookmark grief, then I am happy to just drop it. -- [1] DJ overview - https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com [2] DJ suggested view-per-page - https://www.postgresql.org/message-id/CAKFQuwa9JtoCBVc6CJb7NC5FqMeEAy_A8X4H8t6kVaw7fz9LTw%40mail.gmail.com [3] PS don't know how to do it - https://www.postgresql.org/message-id/CAHut%2BPv5Efz1TLWOLSoFvoyC0mq%2Bs92yFSd534ctWSdjEFtKCw%40mail.gmail.com [4] DJ how to do it using refentry - https://www.postgresql.org/message-id/CAKFQuwYkM5UZT%2B6tG%2BNgZvDcd5VavS%2BxNHsGsWC8jS-KJsxh7w%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Time delayed LR (WAS Re: logical replication restrictions)
ly”? ~~~ 21. @@ -3737,8 +3869,15 @@ send_feedback(XLogRecPtr recvpos, bool force, bool requestReply) /* * No outstanding transactions to flush, we can report the latest received * position. This is important for synchronous replication. + * + * During the delay of time-delayed replication, do not tell the publisher + * that the received latest LSN is already applied and flushed at this + * stage, since we don't apply the transaction yet. If we do so, it leads + * to a wrong assumption of logical replication progress on the publisher + * side. Here, we just send a feedback message to avoid publisher's + * timeout during the delay. */ Minor rewording of the comment SUGGESTION If the subscriber side apply is delayed (because of time-delayed replication) then do not tell the publisher that the received latest LSN is already applied and flushed, otherwise, it leads to the publisher side making a wrong assumption of logical replication progress. Instead, we just send a feedback message to avoid a publisher timeout during the delay. == src/bin/pg_dump/pg_dump.c 22. @@ -4546,9 +4547,14 @@ getSubscriptions(Archive *fout) LOGICALREP_TWOPHASE_STATE_DISABLED); if (fout->remoteVersion >= 16) - appendPQExpBufferStr(query, " s.suborigin\n"); + appendPQExpBufferStr(query, + " s.suborigin,\n" + " s.subminapplydelay\n"); else - appendPQExpBuffer(query, " '%s' AS suborigin\n", LOGICALREP_ORIGIN_ANY); + { + appendPQExpBuffer(query, " '%s' AS suborigin,\n", LOGICALREP_ORIGIN_ANY); + appendPQExpBufferStr(query, " 0 AS subminapplydelay\n"); + } Can’t those appends in the else part can be combined to a single appendPQExpBuffer appendPQExpBuffer(query, " '%s' AS suborigin,\n" " 0 AS subminapplydelay\n" LOGICALREP_ORIGIN_ANY); == src/include/catalog/pg_subscription.h 23. @@ -70,6 +70,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW XLogRecPtr subskiplsn; /* All changes finished at this LSN are * skipped */ + int64 subminapplydelay; /* Replication apply delay */ + NameData subname; /* Name of the subscription */ Oid subowner BKI_LOOKUP(pg_authid); /* Owner of the subscription */ SUGGESTION (for comment) Replication apply delay (ms) ~~ 24. @@ -120,6 +122,7 @@ typedef struct Subscription * in */ XLogRecPtr skiplsn; /* All changes finished at this LSN are * skipped */ + int64 minapplydelay; /* Replication apply delay */ SUGGESTION (for comment) Replication apply delay (ms) -- Kind Regards, Peter Smith. Fujitsu Australia
PGDOCS - sgml linkend using single-quotes
Hi, I happened to notice some examples of SGML linkends that were using single quotes instead of double quotes. It didn't seem to be the conventional style because grepping (from doc/src/sgml folder) showed only a tiny fraction using single quotes. (single-quotes) $ grep --include=*.sgml -rn . -e "linkend='" | wc -l 12 (double-quotes) $ grep --include=*.sgml -rn . -e 'linkend="' | wc -l 5915 ~~ PSA patch that makes them all use double quotes. -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-Replace-linkend-single-quotes-with-double-quotes.patch Description: Binary data
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Jan 17, 2023 at 2:37 PM houzj.f...@fujitsu.com wrote: > > On Tuesday, January 17, 2023 11:32 AM Peter Smith > wrote: > > > > On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Tuesday, January 17, 2023 5:43 AM Peter Smith > > wrote: > > > > > > > > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila > > > > > > > > wrote: > > > > > > > > > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith > > > > > > > > > wrote: > > > > > > > > > > > > 2. > > > > > > > > > > > > /* > > > > > > + * Return the pid of the leader apply worker if the given pid > > > > > > +is the pid of a > > > > > > + * parallel apply worker, otherwise return InvalidPid. > > > > > > + */ > > > > > > +pid_t > > > > > > +GetLeaderApplyWorkerPid(pid_t pid) { int leader_pid = > > > > > > +InvalidPid; int i; > > > > > > + > > > > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > > > > > > + > > > > > > + for (i = 0; i < max_logical_replication_workers; i++) { > > > > > > + LogicalRepWorker *w = >workers[i]; > > > > > > + > > > > > > + if (isParallelApplyWorker(w) && w->proc && pid == > > > > > > + w->proc->pid) { leader_pid = w->leader_pid; break; } } > > > > > > + > > > > > > + LWLockRelease(LogicalRepWorkerLock); > > > > > > + > > > > > > + return leader_pid; > > > > > > +} > > > > > > > > > > > > 2a. > > > > > > IIUC the IsParallelApplyWorker macro does nothing except check > > > > > > that the leader_pid is not InvalidPid anyway, so AFAIK this > > > > > > algorithm does not benefit from using this macro because we will > > > > > > want to return InvalidPid anyway if the given pid matches. > > > > > > > > > > > > So the inner condition can just say: > > > > > > > > > > > > if (w->proc && w->proc->pid == pid) { leader_pid = > > > > > > w->leader_pid; break; } > > > > > > > > > > > > > > > > Yeah, this should also work but I feel the current one is explicit > > > > > and more clear. > > > > > > > > OK. > > > > > > > > But, I have one last comment about this function -- I saw there are > > > > already other functions that iterate max_logical_replication_workers > > > > like this looking for things: > > > > - logicalrep_worker_find > > > > - logicalrep_workers_find > > > > - logicalrep_worker_launch > > > > - logicalrep_sync_worker_count > > > > > > > > So I felt this new function (currently called > > > > GetLeaderApplyWorkerPid) ought to be named similarly to those ones. > > > > e.g. call it something like "logicalrep_worker_find_pa_leader_pid". > > > > > > > > > > I am not sure we can use the name, because currently all the API name > > > in launcher that used by other module(not related to subscription) are > > > like AxxBxx style(see the functions in logicallauncher.h). > > > logicalrep_worker_xxx style functions are currently only declared in > > > worker_internal.h. > > > > > > > OK. I didn't know there was another header convention that you were > > following. > > In that case, it is fine to leave the name as-is. > > Thanks for confirming! > > Attach the new version 0001 patch which addressed all other comments. > OK. I checked the differences between patches v81-0001/v82-0001 and found everything I was expecting to see. I have no more review comments for v82-0001. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com wrote: > > On Tuesday, January 17, 2023 5:43 AM Peter Smith > wrote: > > > > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila > > wrote: > > > > > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith > > wrote: > > > > > > > > 2. > > > > > > > > /* > > > > + * Return the pid of the leader apply worker if the given pid is > > > > +the pid of a > > > > + * parallel apply worker, otherwise return InvalidPid. > > > > + */ > > > > +pid_t > > > > +GetLeaderApplyWorkerPid(pid_t pid) > > > > +{ > > > > + int leader_pid = InvalidPid; > > > > + int i; > > > > + > > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > > > > + > > > > + for (i = 0; i < max_logical_replication_workers; i++) { > > > > + LogicalRepWorker *w = >workers[i]; > > > > + > > > > + if (isParallelApplyWorker(w) && w->proc && pid == w->proc->pid) { > > > > + leader_pid = w->leader_pid; break; } } > > > > + > > > > + LWLockRelease(LogicalRepWorkerLock); > > > > + > > > > + return leader_pid; > > > > +} > > > > > > > > 2a. > > > > IIUC the IsParallelApplyWorker macro does nothing except check that > > > > the leader_pid is not InvalidPid anyway, so AFAIK this algorithm > > > > does not benefit from using this macro because we will want to > > > > return InvalidPid anyway if the given pid matches. > > > > > > > > So the inner condition can just say: > > > > > > > > if (w->proc && w->proc->pid == pid) > > > > { > > > > leader_pid = w->leader_pid; > > > > break; > > > > } > > > > > > > > > > Yeah, this should also work but I feel the current one is explicit and > > > more clear. > > > > OK. > > > > But, I have one last comment about this function -- I saw there are already > > other functions that iterate max_logical_replication_workers like this > > looking > > for things: > > - logicalrep_worker_find > > - logicalrep_workers_find > > - logicalrep_worker_launch > > - logicalrep_sync_worker_count > > > > So I felt this new function (currently called GetLeaderApplyWorkerPid) ought > > to be named similarly to those ones. e.g. call it something like > > "logicalrep_worker_find_pa_leader_pid". > > > > I am not sure we can use the name, because currently all the API name in > launcher that > used by other module(not related to subscription) are like > AxxBxx style(see the functions in logicallauncher.h). > logicalrep_worker_xxx style functions are currently only declared in > worker_internal.h. > OK. I didn't know there was another header convention that you were following. In that case, it is fine to leave the name as-is. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila wrote: > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith wrote: > > > > 2. > > > > /* > > + * Return the pid of the leader apply worker if the given pid is the pid > > of a > > + * parallel apply worker, otherwise return InvalidPid. > > + */ > > +pid_t > > +GetLeaderApplyWorkerPid(pid_t pid) > > +{ > > + int leader_pid = InvalidPid; > > + int i; > > + > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > > + > > + for (i = 0; i < max_logical_replication_workers; i++) > > + { > > + LogicalRepWorker *w = >workers[i]; > > + > > + if (isParallelApplyWorker(w) && w->proc && pid == w->proc->pid) > > + { > > + leader_pid = w->leader_pid; > > + break; > > + } > > + } > > + > > + LWLockRelease(LogicalRepWorkerLock); > > + > > + return leader_pid; > > +} > > > > 2a. > > IIUC the IsParallelApplyWorker macro does nothing except check that > > the leader_pid is not InvalidPid anyway, so AFAIK this algorithm does > > not benefit from using this macro because we will want to return > > InvalidPid anyway if the given pid matches. > > > > So the inner condition can just say: > > > > if (w->proc && w->proc->pid == pid) > > { > > leader_pid = w->leader_pid; > > break; > > } > > > > Yeah, this should also work but I feel the current one is explicit and > more clear. OK. But, I have one last comment about this function -- I saw there are already other functions that iterate max_logical_replication_workers like this looking for things: - logicalrep_worker_find - logicalrep_workers_find - logicalrep_worker_launch - logicalrep_sync_worker_count So I felt this new function (currently called GetLeaderApplyWorkerPid) ought to be named similarly to those ones. e.g. call it something like "logicalrep_worker_find_pa_leader_pid". > > > ~ > > > > 2b. > > A possible alternative comment. > > > > BEFORE > > Return the pid of the leader apply worker if the given pid is the pid > > of a parallel apply worker, otherwise return InvalidPid. > > > > > > AFTER > > If the given pid has a leader apply worker then return the leader pid, > > otherwise, return InvalidPid. > > > > I don't think that is an improvement. > > > == > > > > src/backend/utils/adt/pgstatfuncs.c > > > > 3. > > > > @@ -434,6 +435,16 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) > > values[28] = Int32GetDatum(leader->pid); > > nulls[28] = false; > > } > > + else > > + { > > + int leader_pid = GetLeaderApplyWorkerPid(beentry->st_procpid); > > + > > + if (leader_pid != InvalidPid) > > + { > > + values[28] = Int32GetDatum(leader_pid); > > + nulls[28] = false; > > + } > > + > > > > 3a. > > There is an existing comment preceding this if/else but it refers only > > to leaders of parallel groups. Should that comment be updated to > > mention the leader apply worker too? > > > > Yeah, we can slightly adjust the comments. How about something like the below: > index 415e711729..7eb668634a 100644 > --- a/src/backend/utils/adt/pgstatfuncs.c > +++ b/src/backend/utils/adt/pgstatfuncs.c > @@ -410,9 +410,9 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) > > /* > * If a PGPROC entry was retrieved, display > wait events and lock > -* group leader information if any. To avoid > extra overhead, no > -* extra lock is being held, so there is no guarantee > of > -* consistency across multiple rows. > +* group leader or apply leader information if > any. To avoid extra > +* overhead, no extra lock is being held, so > there is no guarantee > +* of consistency across multiple rows. > */ > if (proc != NULL) > { > @@ -428,7 +428,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) > /* > * Show the leader only for active > parallel workers. This > * leaves the field as NULL for the > leader of a parallel > -* group. > +* group or the leader of a parallel apply. > */ > if (leader && leader->pid != > beentry->st_procpid) > The updated comment LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Here are some review comments for v81-0001. == Commit Message 1. Additionally, update the leader_pid column in pg_stat_activity as well to display the PID of the leader apply worker for parallel apply workers. ~ Probably it should not say both "Additionally" and "as well" in the same sentence. == src/backend/replication/logical/launcher.c 2. /* + * Return the pid of the leader apply worker if the given pid is the pid of a + * parallel apply worker, otherwise return InvalidPid. + */ +pid_t +GetLeaderApplyWorkerPid(pid_t pid) +{ + int leader_pid = InvalidPid; + int i; + + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); + + for (i = 0; i < max_logical_replication_workers; i++) + { + LogicalRepWorker *w = >workers[i]; + + if (isParallelApplyWorker(w) && w->proc && pid == w->proc->pid) + { + leader_pid = w->leader_pid; + break; + } + } + + LWLockRelease(LogicalRepWorkerLock); + + return leader_pid; +} 2a. IIUC the IsParallelApplyWorker macro does nothing except check that the leader_pid is not InvalidPid anyway, so AFAIK this algorithm does not benefit from using this macro because we will want to return InvalidPid anyway if the given pid matches. So the inner condition can just say: if (w->proc && w->proc->pid == pid) { leader_pid = w->leader_pid; break; } ~ 2b. A possible alternative comment. BEFORE Return the pid of the leader apply worker if the given pid is the pid of a parallel apply worker, otherwise return InvalidPid. AFTER If the given pid has a leader apply worker then return the leader pid, otherwise, return InvalidPid. == src/backend/utils/adt/pgstatfuncs.c 3. @@ -434,6 +435,16 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) values[28] = Int32GetDatum(leader->pid); nulls[28] = false; } + else + { + int leader_pid = GetLeaderApplyWorkerPid(beentry->st_procpid); + + if (leader_pid != InvalidPid) + { + values[28] = Int32GetDatum(leader_pid); + nulls[28] = false; + } + 3a. There is an existing comment preceding this if/else but it refers only to leaders of parallel groups. Should that comment be updated to mention the leader apply worker too? ~ 3b. It may be unrelated to this patch, but it seems strange to me that the nulls[28]/values[28] assignments are done where they are. Every other nulls/values assignment of this function here is pretty much in the correct numerical order except this one, so IMO this code ought to be relocated to later in this same function. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Perform streaming logical transactions by background workers and parallel apply
Here are some review comments for patch v79-0002. == General 1. I saw that earlier in this thread Hou-san [1] and Amit [2] also seemed to say there is not much point for this patch. So I wanted to +1 that same opinion. I feel this patch just adds more complexity for almost no gain: - reducing the 'max_apply_workers_per_suibscription' seems not very common in the first place. - even when the GUC is reduced, at that point in time all the workers might be in use so there may be nothing that can be immediately done. - IIUC the excess workers (for a reduced GUC) are going to get freed naturally anyway over time as more transactions are completed so the pool size will reduce accordingly. ~ OTOH some refactoring parts of this patch (e.g. the new pa_stop_worker function) look better to me. I would keep those ones but remove all the pa_stop_idle_workers function/call. *** NOTE: The remainder of these review comments are maybe only relevant if you are going to keep this pa_stop_idle_workers behaviour... == Commit message 2. If the max_parallel_apply_workers_per_subscription is changed to a lower value, try to stop free workers in the pool to keep the number of workers lower than half of the max_parallel_apply_workers_per_subscription SUGGESTION If the GUC max_parallel_apply_workers_per_subscription is changed to a lower value, try to stop unused workers to keep the pool size lower than half of max_parallel_apply_workers_per_subscription. == .../replication/logical/applyparallelworker.c 3. pa_free_worker if (winfo->serialize_changes || list_length(ParallelApplyWorkerPool) > (max_parallel_apply_workers_per_subscription / 2)) { pa_stop_worker(winfo); return; } winfo->in_use = false; winfo->serialize_changes = false; ~ IMO the above code can be more neatly written using if/else because then there is only one return point, and there is a place to write the explanatory comment about the else. SUGGESTION if (winfo->serialize_changes || list_length(ParallelApplyWorkerPool) > (max_parallel_apply_workers_per_subscription / 2)) { pa_stop_worker(winfo); } else { /* Don't stop the worker. Only mark it available for re-use. */ winfo->in_use = false; winfo->serialize_changes = false; } == src/backend/replication/logical/worker.c 4. pa_stop_idle_workers /* * Try to stop parallel apply workers that are not in use to keep the number of * workers lower than half of the max_parallel_apply_workers_per_subscription. */ void pa_stop_idle_workers(void) { List*active_workers; ListCell *lc; int max_applyworkers = max_parallel_apply_workers_per_subscription / 2; if (list_length(ParallelApplyWorkerPool) <= max_applyworkers) return; active_workers = list_copy(ParallelApplyWorkerPool); foreach(lc, active_workers) { ParallelApplyWorkerInfo *winfo = (ParallelApplyWorkerInfo *) lfirst(lc); pa_stop_worker(winfo); /* Recheck the number of workers. */ if (list_length(ParallelApplyWorkerPool) <= max_applyworkers) break; } list_free(active_workers); } ~ 4a. function comment SUGGESTION Try to keep the worker pool size lower than half of the max_parallel_apply_workers_per_subscription. ~ 4b. function name This is not stopping all idle workers, so maybe a more meaningful name for this function is something more like "pa_reduce_workerpool" ~ 4c. IMO the "max_applyworkers" var is a misleading name. Maybe something like "goal_poolsize" is better? ~ 4d. Maybe I misunderstand the logic for the pool, but shouldn't this be checking the winfo->in_use flag before blindly stopping each worker? == src/backend/replication/logical/worker.c 5. @@ -3630,6 +3630,13 @@ LogicalRepApplyLoop(XLogRecPtr last_received) { ConfigReloadPending = false; ProcessConfigFile(PGC_SIGHUP); + + /* + * Try to stop free workers in the pool in case the + * max_parallel_apply_workers_per_subscription is changed to a + * lower value. + */ + pa_stop_idle_workers(); } 5a. SUGGESTED COMMENT If max_parallel_apply_workers_per_subscription is changed to a lower value, try to reduce the worker pool to match. ~ 5b. Instead of unconditionally calling pa_stop_idle_workers, shouldn't this code compare the value of max_parallel_apply_workers_per_subscription before/after the ProcessConfigFile so it only calls if the GUC was lowered? -- [1] Hou-san - https://www.postgresql.org/message-id/OS0PR01MB5716E527412A3481F90B4397941A9%40OS0PR01MB5716.jpnprd01.prod.outlook.com [2] Amit - https://www.postgresql.org/message-id/CAA4eK1J%3D9m-VNRMHCqeG8jpX0CTn3Ciad2o4H-ogrZMDJ3tn4w%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
On Fri, Jan 13, 2023 at 2:37 PM Amit Kapila wrote: > > > 3. > > > > > > + leader_pid integer > > + > > + > > + Process ID of the leader apply worker if this process is a parallel > > + apply worker; NULL if this process is a leader apply worker or does > > not > > + participate in parallel apply, or a synchronization worker > > + > > > > I felt this change is giving too many details and ended up just > > muddying the water. > > > > I see that we give a similar description for other parameters as well. > For example leader_pid in pg_stat_activity, see client_dn, > client_serial in pg_stat_ssl. It is better to be consistent here and > this gives the reader a bit more information when the value is NULL > for the new column. > It is OK to give extra details as those other examples do, but my point -- where I wrote "the leader apply worker and the (not leader) apply worker are one-and-the-same process" -- was there are currently only 3 kinds of workers possible (leader apply, parallel apply, tablsync). If it is not a "parallel apply" worker then it can only be one of the other 2. So I think it is sufficient and less confusing to say: Process ID of the leader apply worker if this process is a parallel apply worker; NULL if this process is a leader apply worker or a synchronization worker. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
Here are my review comments for v79-0001. == General 1. When Amit suggested [1] changing the name just to "leader_pid" instead of "leader_apply_pid" I thought he was only referring to changing the view column name, not also the internal member names of the worker structure. Maybe it is OK anyway, but please check if that was the intention. == Commit message 2. leader_pid is the process ID of the leader apply worker if this process is a parallel apply worker. If this field is NULL, it indicates that the process is a leader apply worker or does not participate in parallel apply, or a synchronization worker. ~ This text is just cut/paste from the monitoring.sgml. In a review comment below I suggest some changes to that text, so then this commit message should also change to be the same. == doc/src/sgml/monitoring.sgml 3. + leader_pid integer + + + Process ID of the leader apply worker if this process is a parallel + apply worker; NULL if this process is a leader apply worker or does not + participate in parallel apply, or a synchronization worker + I felt this change is giving too many details and ended up just muddying the water. E.g. Now this says basically "NULL if AAA or BBB, or CCC" but that makes it sounds like there are 3 other things the process could be instead of a parallel worker. But that is not not really true unless you are making some distinction between the main "apply worker" which is a leader versus a main apply worker which is not a leader. IMO we should not be making any distinction at all - the leader apply worker and the main (not leader) apply worker are one-and-the-same process. So, I still prefer my previous suggestion (see [2] #5b] == src/backend/catalog/system_views.sql 4. @@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS su.oid AS subid, su.subname, st.pid, +st.leader_pid, st.relid, st.received_lsn, st.last_msg_send_time, IMO it would be very useful to have an additional "kind" attribute for this view. This will save the user from needing to do mental gymnastics every time just to recognise what kind of process they are looking at. For example, I tried this: CREATE VIEW pg_stat_subscription AS SELECT su.oid AS subid, su.subname, CASE WHEN st.relid IS NOT NULL THEN 'tablesync' WHEN st.leader_pid IS NOT NULL THEN 'parallel apply' ELSE 'leader apply' END AS kind, st.pid, st.leader_pid, st.relid, st.received_lsn, st.last_msg_send_time, st.last_msg_receipt_time, st.latest_end_lsn, st.latest_end_time FROM pg_subscription su LEFT JOIN pg_stat_get_subscription(NULL) st ON (st.subid = su.oid); and it results in much more readable output IMO: test_sub=# select * from pg_stat_subscription; subid | subname | kind | pid | leader_pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | lat est_end_lsn |latest_end_time ---+-+--+--++---+--+---+---+ +--- 16388 | sub1| leader apply | 5281 || | 0/1901378| 2023-01-13 12:39:03.984249+11 | 2023-01-13 12:39:03.986157+11 | 0/1 901378 | 2023-01-13 12:39:03.984249+11 (1 row) Thoughts? -- [1] Amit - https://www.postgresql.org/message-id/CAA4eK1KYUbnthSPyo4VjnhMygB0c1DZtp0XC-V2-GSETQ743ww%40mail.gmail.com [2] My v78-0001 review - https://www.postgresql.org/message-id/CAHut%2BPvA10Bp9Jaw9OS2%2BpuKHr7ry_xB3Tf2-bbv5gyxD5E_gw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
kers ~~~ 10. @@ -3249,7 +3263,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Time of last write-ahead log location reported to origin WAL - sender + sender; null for the parallel apply worker (same as #6) BEFORE null for the parallel apply worker AFTER null for parallel apply workers == src/backend/catalog/system_views.sql 11. @@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS su.oid AS subid, su.subname, st.pid, +st.apply_leader_pid, st.relid, st.received_lsn, st.last_msg_send_time, (Same as general comment #1 about terminology) "apply_leader_pid" --> "leader_apply_pid" == src/backend/replication/logical/launcher.c 12. + if (worker.apply_leader_pid == InvalidPid) nulls[3] = true; else - values[3] = LSNGetDatum(worker.last_lsn); - if (worker.last_send_time == 0) + values[3] = Int32GetDatum(worker.apply_leader_pid); + 12a. (Same as general comment #1 about terminology) "apply_leader_pid" --> "leader_apply_pid" ~~ 12b. I wondered if here the code should be using the isParallelApplyWorker(worker) macro here for readability. e.g. if (isParallelApplyWorker(worker)) values[3] = Int32GetDatum(worker.apply_leader_pid); else nulls[3] = true; == src/include/catalog/pg_proc.dat 13. + proallargtypes => '{oid,oid,oid,int4,int4,pg_lsn,timestamptz,timestamptz,pg_lsn,timestamptz}', + proargmodes => '{i,o,o,o,o,o,o,o,o,o}', + proargnames => '{subid,subid,relid,pid,apply_leader_pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}', (Same as general comment #1 about terminology) "apply_leader_pid" --> "leader_apply_pid" == src/test/regress/expected/rules.out 14. @@ -2094,6 +2094,7 @@ pg_stat_ssl| SELECT s.pid, pg_stat_subscription| SELECT su.oid AS subid, su.subname, st.pid, +st.apply_leader_pid, st.relid, st.received_lsn, st.last_msg_send_time, @@ -2101,7 +2102,7 @@ pg_stat_subscription| SELECT su.oid AS subid, st.latest_end_lsn, st.latest_end_time FROM (pg_subscription su - LEFT JOIN pg_stat_get_subscription(NULL::oid) st(subid, relid, pid, received_lsn, last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time) ON ((st.subid = su.oid))); + LEFT JOIN pg_stat_get_subscription(NULL::oid) st(subid, relid, pid, apply_leader_pid, received_lsn, last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time) ON ((st.subid = su.oid))); pg_stat_subscription_stats| SELECT ss.subid, s.subname, ss.apply_error_count, (Same comment as elsewhere) "apply_leader_pid" --> "leader_apply_pid" -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [DOCS] Stats views and functions not in order?
On Wed, Jan 4, 2023 at 6:08 PM vignesh C wrote: > > On Mon, 2 Jan 2023 at 13:47, Peter Eisentraut > wrote: > > > > On 08.12.22 03:30, Peter Smith wrote: > > > PSA patches for v9* > > > > > > v9-0001 - Now the table rows are ordered per PeterE's suggestions [1] > > > > committed Thanks for pushing. > > > > > v9-0002 - All the review comments from DavidJ [2] are addressed > > > > I'm not sure about this one. It removes the "see [link] for details" > > phrases and instead makes the view name a link. I think this loses the > > cue that there is more information elsewhere. Otherwise, one could > > think that, say, the entry about pg_stat_activity is the primary source > > and the link just links to itself. Also keep in mind that people use > > media where links are not that apparent (PDF), so the presence of a link > > by itself cannot be the only cue about the flow of the information. > PSA new patch for v10-0001 v9-0001 --> pushed, thanks! v9-0002 --> I removed this based on the reject reason above v9-0003 --> v10-0001 > I'm not sure if anything is pending for v9-0003, if there is something > pending, please post an updated patch for the same. > Thanks for the reminder. PSA v10. -- Kind Regards, Peter Smith. Fujitsu Australia v10-0001-Add-Statistics-Views-section-and-refentry-for-ea.patch Description: Binary data
Re: pgsql: Doc: Explain about Column List feature.
On Wed, Dec 21, 2022 at 6:59 PM Alvaro Herrera wrote: > > On 2022-Dec-21, Peter Smith wrote: > > > By "searching" I also meant just scanning visually, although I was > > thinking more about scanning the PDF. > > > > Right now, the intention of any text box is obvious at a glance > > because of those titles like "Caution", "Tip", "Note", "Warning". > > Sure, the HTML rendering also uses colours to convey the purpose, but > > in the PDF version [1] everything is black-and-white so apart from the > > title all boxes look the same. That's why I felt using non-standard > > box titles might be throwing away some of the meaning - e.g. the > > reader of the PDF won't know anymore at a glance are they looking at a > > warning or a tip. > > Oh, I see. It's been so long that I haven't looked at the PDFs, that I > failed to realize that they don't use color. I agree that would be a > problem. Maybe we can change the title to have the word: > > Warning: Combining Column Lists from Multiple Publications > That last idea LGTM. But no patch at all LGTM also. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Force streaming every change in logical decoding
On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada wrote: > > On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila wrote: > > > > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu) > > wrote: > > > > > > Dear hackers, > > > > > > > We have discussed three different ways to provide GUC for these > > > > features. (1) Have separate GUCs like force_server_stream_mode, > > > > force_server_serialize_mode, force_client_serialize_mode (we can use > > > > different names for these) for each of these; (2) Have two sets of > > > > GUCs for server and client. We can have logical_decoding_mode with > > > > values as 'stream' and 'serialize' for the server and then > > > > logical_apply_serialize = true/false for the client. (3) Have one GUC > > > > like logical_replication_mode with values as 'server_stream', > > > > 'server_serialize', 'client_serialize'. > > > > > > I also agreed for adding new GUC parameters (and I have already done > > > partially > > > in parallel apply[1]), and basically options 2 made sense for me. But is > > > it OK > > > that we can choose "serialize" mode even if subscribers require streaming? > > > > > > Currently the reorder buffer transactions are serialized on publisher > > > only when > > > the there are no streamable transaction. So what happen if the > > > logical_decoding_mode = "serialize" but streaming option streaming is on? > > > If we > > > break the first one and serialize changes on publisher anyway, it may be > > > not > > > suitable for testing the normal operation. > > > > > > > I think the change will be streamed as soon as the next change is > > processed even if we serialize based on this option. See > > ReorderBufferProcessPartialChange. However, I see your point that when > > the streaming option is given, the value 'serialize' for this GUC may > > not make much sense. > > > > > Therefore, I came up with the variant of (2): logical_decoding_mode can be > > > "normal" or "immediate". > > > > > > "normal" is a default value, which is same as current HEAD. Changes are > > > streamed > > > or serialized when the buffered size exceeds logical_decoding_work_mem. > > > > > > When users set to "immediate", the walsenders starts to stream or > > > serialize all > > > changes. The choice is depends on the subscription option. > > > > > > > The other possibility to achieve what you are saying is that we allow > > a minimum value of logical_decoding_work_mem as 0 which would mean > > stream or serialize each change depending on whether the streaming > > option is enabled. I think we normally don't allow a minimum value > > below a certain threshold for other *_work_mem parameters (like > > maintenance_work_mem, work_mem), so we have followed the same here. > > And, I think it makes sense from the user's perspective because below > > a certain threshold it will just add overhead by either writing small > > changes to the disk or by sending those over the network. However, it > > can be quite useful for testing/debugging. So, not sure, if we should > > restrict setting logical_decoding_work_mem below a certain threshold. > > What do you think? > > I agree with (2), having separate GUCs for publisher side and > subscriber side. Also, on the publisher side, Amit's idea, controlling > the logical decoding behavior by changing logical_decoding_work_mem, > seems like a good idea. > > But I'm not sure it's a good idea if we lower the minimum value of > logical_decoding_work_mem to 0. I agree it's helpful for testing and > debugging but setting logical_decoding_work_mem = 0 doesn't benefit > users at all, rather brings risks. > > I prefer the idea Kuroda-san previously proposed; setting > logical_decoding_mode = 'immediate' means setting > logical_decoding_work_mem = 0. We might not need to have it as an enum > parameter since it has only two values, though. Did you mean one GUC (logical_decoding_mode) will cause a side-effect implicit value change on another GUC value (logical_decoding_work_mem)? If so, then that seems a like potential source of confusion IMO. - e.g. actual value is invisibly set differently from what the user sees in the conf file - e.g. will it depend on the order they get assigned Are there any GUC precedents for something like that? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Force streaming every change in logical decoding
Here are some review comments for patch v2. Since the GUC is still under design maybe these comments can be ignored for now, but I guess similar comments will apply in future anyhow (just with some name changes). == 1. Commit message Add a new GUC force_stream_mode, when it is set on, send the change to output plugin immediately in streaming mode. Otherwise, send until logical_decoding_work_mem is exceeded. ~ Is that quite right? I thought it was more like shown below: SUGGESTION Add a new GUC 'force_stream_mode' which modifies behavior when streaming mode is enabled. If force_stream_mode is on the changes are sent to the output plugin immediately. Otherwise,(when force_stream_mode is off) changes are written to memory until logical_decoding_work_mem is exceeded. == 2. doc/src/sgml/config.sgml + +Specifies whether to force sending the changes to output plugin +immediately in streaming mode. If set to off (the +default), send until logical_decoding_work_mem is +exceeded. + Suggest slight rewording like #1. SUGGESTION This modifies the behavior when streaming mode is enabled. If set to on the changes are sent to the output plugin immediately. If set off (the default), changes are written to memory until logical_decoding_work_mem is exceeded. == 3. More docs. It might be helpful if this developer option is referenced also on the page "31.10.1 Logical Replication > Configuration Settings > Publishers" [1] == src/backend/replication/logical/reorderbuffer.c 4. GUCs +/* + * Whether to send the change to output plugin immediately in streaming mode. + * When it is off, wait until logical_decoding_work_mem is exceeded. + */ +bool force_stream_mode; 4a. "to output plugin" -> "to the output plugin" ~ 4b. By convention (see. [2]) IIUC there should be some indication that these (this and 'logical_decoding_work_mem') are GUC variables. e.g. these should be refactored to be grouped togther without the other static var in between. And add a comment for them both like: /* GUC variable. */ ~ 4c. Also, (see [2]) it makes the code more readable to give the GUC an explicit initial value. SUGGESTION bool force_stream_mode = false; ~~~ 5. ReorderBufferCheckMemoryLimit + /* we know there has to be one, because the size is not zero */ Uppercase comment. Although not strictly added by this patch you might as well make this change while adjusting the indentation. == src/backend/utils/misc/guc_tables.c 6. + { + {"force_stream_mode", PGC_USERSET, DEVELOPER_OPTIONS, + gettext_noop("Force sending the changes to output plugin immediately if streaming is supported, without waiting till logical_decoding_work_mem."), + NULL, + GUC_NOT_IN_SAMPLE + }, + _stream_mode, + false, + NULL, NULL, NULL + }, "without waiting till logical_decoding_work_mem." seem like an incomplete sentence SUGGESTION Force sending any streaming changes to the output plugin immediately without waiting until logical_decoding_work_mem is exceeded."), -- [1] https://www.postgresql.org/docs/devel/logical-replication-config.html [2] GUC declarations - https://github.com/postgres/postgres/commit/d9d873bac67047cfacc9f5ef96ee488f2cb0f1c3 Kind Regards, Peter Smith. Fujitsu Australia
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Dec 20, 2022 at 5:22 PM Peter Smith wrote: > > On Tue, Dec 20, 2022 at 2:20 PM Amit Kapila wrote: > > > > On Tue, Dec 20, 2022 at 8:17 AM Peter Smith wrote: > > > > > > Summary > > > --- > > > > > > In summary, everything I have tested so far appeared to be working > > > properly. In other words, for overlapping streamed transactions of > > > different kinds, and regardless of whether zero/some/all of those > > > transactions are getting processed by a PA worker, the resulting > > > replicated data looked consistently OK. > > > > > > > Thanks for doing the detailed testing of this patch. I think the one > > area where we can focus more is the switch-to-serialization mode while > > sending changes to the parallel worker. > > > > > > > > NOTE - all testing described in this post above was using v58-0001 > > > only. However, the point of implementing these as a .spec test was to > > > be able to repeat these same regression tests on newer versions with > > > minimal manual steps required. Later I plan to fetch/apply the most > > > recent patch version and repeat these same tests. > > > > > > > That would be really helpful. > > > FYI, my pub-sub.spec tests gave the same result (i.e. pass) when re-run with the latest v63 (0001,0002,0003) applied. -- Kind Regards, Peter Smith. Fujitsu Australia