On Fri, Sep 15, 2023 at 03:08:21PM +0530, vignesh C wrote: > On Tue, 12 Sept 2023 at 14:25, Hayato Kuroda (Fujitsu) > <kuroda.hay...@fujitsu.com> wrote: >> Is there a possibility that apply worker on old cluster connects to the >> publisher during the upgrade? Regarding the pg_upgrade on publisher, the we >> refuse TCP/IP connections from remotes and port number is also changed, so >> we can >> assume that subscriber does not connect to. But IIUC such settings may not >> affect >> to the connection source, so that the apply worker may try to connect to the >> publisher. Also, is there any hazards if it happens? > > Yes, there is a possibility that the apply worker gets started and new > transaction data is being synced from the publisher. I have made a fix > not to start the launcher process in binary ugprade mode as we don't > want the launcher to start apply worker during upgrade.
Hmm. I was wondering if 0001 is the right way to handle this case, but at the end I'm OK to paint one extra isBinaryUpgrade in the code path where apply launchers are registered. I don't think that the patch is complete, though. A comment should be added in pg_upgrade's server.c, exactly start_postmaster(), to tell that -b also stops apply workers. I am attaching a version updated as of the attached, that I'd be OK to apply. I don't really think that we need to worry about a subscriber connecting back to a publisher in this case, though? I mean, each postmaster instance started by pg_upgrade restricts the access to the instance with unix_socket_directories set to a custom path and permissions at 0700, and a subscription's connection string does not know the unix path used by pg_upgrade. I certainly agree that stopping these processes could lead to inconsistencies in the data the subscribers have been holding though, if we are not careful, so preventing them from running is a good practice anyway. I have also reviewed 0002. As a whole, I think that I'm OK with the main approach of the patch in pg_dump to use a new type of dumpable object for subscription relations that are dumped with their upgrade functions after. This still needs more work, and more documentation. Also, perhaps we should really have an option to control if this part of the copy happens or not. With a --no-subscription-relations for pg_dump at least? +{ oid => '4551', descr => 'add a relation with the specified relation state to pg_subscription_rel table', During a development cycle, any new function added needs to use an OID in range 8000-9999. Running unused_oids will suggest new random OIDs. FWIW, I am not convinced that there is a need for two functions to add an entry to pg_subscription_rel, with sole difference between both the handling of a valid or invalid LSN. We should have only one function that's able to handle NULL for the LSN. So let's remove rel_state_a and rel_state_b, and have a single rel_state(). The description of the SQL functions is inconsistent with the other binary upgrade ones, I would suggest for the two functions: "for use by pg_upgrade (relation for pg_subscription_rel)" "for use by pg_upgrade (remote_lsn for origin)" + i_srsublsn = PQfnumber(res, "srsublsn"); [...] + subrinfo[cur_rel].srsublsn = pg_strdup(PQgetvalue(res, i, i_srsublsn)); In getSubscriptionTables(), this should check for PQgetisnull() because we would have a NULL value for InvalidXLogRecPtr in the catalog. Using a char* for srsublsn is OK, but just assign NULL to it, then just pass a hardcoded NULL value to the function as we do in other places. So I don't quite get why this is not the same handling as suboriginremotelsn. getSubscriptionTables() is entirely skipped if we don't want any subscriptions, if we deal with a server of 9.6 or older or if we don't do binary upgrades, which is OK. +/* + * getSubscriptionTables + * get information about subscription membership for dumpable tables. + */ This commit is slightly misleading and should mention that this is an upgrade-only path? The code for dumpSubscriptionTable() is a copy-paste of dumpPublicationTable(), but a lot of what you are doing here is actually pointless if we are not in binary mode? Why should this code path not taken only under dataOnly? I mean, this is a code path we should never take except if we are in binary mode. This should have at least a cross-check to make sure that we never have a DO_SUBSCRIPTION_REL in this code path if we are in non-binary mode. + if (dopt->binary_upgrade && subinfo->suboriginremotelsn) + { + appendPQExpBufferStr(query, + "SELECT pg_catalog.binary_upgrade_replorigin_advance("); + appendStringLiteralAH(query, subinfo->dobj.name, fout); + appendPQExpBuffer(query, ", '%s');\n", subinfo->suboriginremotelsn); + } Hmm.. Could it be actually useful even for debugging to still have this query if suboriginremotelsn is an InvalidXLogRecPtr? I think that this should have a comment of the kind "\n-- For binary upgrade, blah". At least it would not be a bad thing to enforce a correct state from the start, removing the NULL check for the second argument in binary_upgrade_replorigin_advance(). + /* We need to check for pg_replication_origin_status only once. */ Perhaps it would be better to explain why? + "WHERE coalesce(remote_lsn, '0/0') = '0/0'" Why a COALESCE here? Cannot this stuff just use NULL? + fprintf(script, "database:%s subscription:%s relation:%s in non-ready state\n", Could it be possible to include the schema of the relation in this log? +static void check_for_subscription_state(ClusterInfo *cluster); I'd be tempted to move that into a patch on its own, actually, for a cleaner history. +# Copyright (c) 2022-2023, PostgreSQL Global Development Group New as of 2023. +# Check that after upgradation of the subscriber server, the incremental +# changes added to the publisher are replicated. [..] + For upgradation of the subscriptions, all the subscriptions on the old + cluster must have a valid <varname>remote_lsn</varname>, and all the Upgradation? I think that this should be reworded: "All the subscriptions of an old cluster require a valid remote_lsn during an upgrade." A CI run is reporting the following compilation warnings: [04:21:15.290] pg_dump.c: In function ‘getSubscriptionTables’: [04:21:15.290] pg_dump.c:4655:29: error: ‘subinfo’ may be used uninitialized in this function [-Werror=maybe-uninitialized] [04:21:15.290] 4655 | subrinfo[cur_rel].subinfo = subinfo; +ok(-d $new_sub->data_dir . "/pg_upgrade_output.d", + "pg_upgrade_output.d/ not removed after pg_upgrade failure"); Not sure that there's a need for this check. Okay, that's cheap. And, err. We are going to need an option to control if the slot data is copied, and a bit more documentation in pg_upgrade to explain how things happen when the copy happens. -- Michael
signature.asc
Description: PGP signature