Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila wrote: > Now, as mentioned in the first paragraph, it seems we anyway don't > need to reset the WAL at the end when setting the next OID for the new > cluster with the -o option. If that is true, then I think even without > slots work it will be helpful to have such an option in pg_resetwal. > > Thoughts? I wonder if we should instead provide a way to reset the OID counter with a function call inside the database, gated by IsBinaryUpgrade. Having something like pg_resetwal --but-dont-actually-reset-the-wal seems both self-contradictory and vulnerable to abuse that we might be better off not inviting. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, Oct 13, 2023 at 12:00 AM Robert Haas wrote: > > On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila wrote: > > Now, as mentioned in the first paragraph, it seems we anyway don't > > need to reset the WAL at the end when setting the next OID for the new > > cluster with the -o option. If that is true, then I think even without > > slots work it will be helpful to have such an option in pg_resetwal. > > > > Thoughts? > > I wonder if we should instead provide a way to reset the OID counter > with a function call inside the database, gated by IsBinaryUpgrade. > I think the challenge in doing so would be that when the server is running, a concurrent checkpoint can also update the OID counter value in the control file. See below code: CreateCheckPoint() { ... LWLockAcquire(OidGenLock, LW_SHARED); checkPoint.nextOid = ShmemVariableCache->nextOid; if (!shutdown) checkPoint.nextOid += ShmemVariableCache->oidCount; LWLockRelease(OidGenLock); ... UpdateControlFile() ... } Now, we can try to pass some startup options like checkpoint_timeout with a large value to ensure that checkpoint won't interfere but not sure if that would be bulletproof. Instead, how about allowing pg_upgrade to update the control file of the new cluster (with the required value of OID) following the same method as pg_resetwal does in RewriteControlFile()? > Having something like pg_resetwal --but-dont-actually-reset-the-wal > seems both self-contradictory and vulnerable to abuse that we might be > better off not inviting. > Fair point. -- With Regards, Amit Kapila.
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, Oct 13, 2023 at 9:29 AM Amit Kapila wrote: > > On Fri, Oct 13, 2023 at 12:00 AM Robert Haas wrote: > > > > On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila wrote: > > > Now, as mentioned in the first paragraph, it seems we anyway don't > > > need to reset the WAL at the end when setting the next OID for the new > > > cluster with the -o option. If that is true, then I think even without > > > slots work it will be helpful to have such an option in pg_resetwal. > > > > > > Thoughts? > > > > I wonder if we should instead provide a way to reset the OID counter > > with a function call inside the database, gated by IsBinaryUpgrade. > > > > I think the challenge in doing so would be that when the server is > running, a concurrent checkpoint can also update the OID counter value > in the control file. See below code: > > CreateCheckPoint() > { > ... > LWLockAcquire(OidGenLock, LW_SHARED); > checkPoint.nextOid = ShmemVariableCache->nextOid; > if (!shutdown) > checkPoint.nextOid += ShmemVariableCache->oidCount; > LWLockRelease(OidGenLock); > ... > UpdateControlFile() > ... > } > But is this a problem? basically, we will set the ShmemVariableCache->nextOid counter to the value that we want in the IsBinaryUpgrade-specific function. And then the shutdown checkpoint will flush that value to the control file and that is what we want no? I mean instead of resetwal directly modifying the control file we will modify that value in the server using the binary_upgrade function and then have that value flush to the disk by shutdown checkpoint. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, Oct 13, 2023 at 10:37 AM Dilip Kumar wrote: > > On Fri, Oct 13, 2023 at 9:29 AM Amit Kapila wrote: > > > > On Fri, Oct 13, 2023 at 12:00 AM Robert Haas wrote: > > > > > > On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila > > > wrote: > > > > Now, as mentioned in the first paragraph, it seems we anyway don't > > > > need to reset the WAL at the end when setting the next OID for the new > > > > cluster with the -o option. If that is true, then I think even without > > > > slots work it will be helpful to have such an option in pg_resetwal. > > > > > > > > Thoughts? > > > > > > I wonder if we should instead provide a way to reset the OID counter > > > with a function call inside the database, gated by IsBinaryUpgrade. > > > > > > > I think the challenge in doing so would be that when the server is > > running, a concurrent checkpoint can also update the OID counter value > > in the control file. See below code: > > > > CreateCheckPoint() > > { > > ... > > LWLockAcquire(OidGenLock, LW_SHARED); > > checkPoint.nextOid = ShmemVariableCache->nextOid; > > if (!shutdown) > > checkPoint.nextOid += ShmemVariableCache->oidCount; > > LWLockRelease(OidGenLock); > > ... > > UpdateControlFile() > > ... > > } > > > > But is this a problem? basically, we will set the > ShmemVariableCache->nextOid counter to the value that we want in the > IsBinaryUpgrade-specific function. And then the shutdown checkpoint > will flush that value to the control file and that is what we want no? > I think that can work. Basically, we need to do something like what SetNextObjectId() does and then let the shutdown checkpoint update the actual value in the control file. > I mean instead of resetwal directly modifying the control file we > will modify that value in the server using the binary_upgrade function > and then have that value flush to the disk by shutdown checkpoint. > True, the alternative to consider is to let pg_upgrade update the control file by itself with the required value of OID. The point I am slightly worried about doing via server-side function is that some online and or shutdown checkpoint can update other values in the control file as well whereas if we do via pg_upgrade, we can have better control over just updating the OID. -- With Regards, Amit Kapila.
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila wrote: > > > But is this a problem? basically, we will set the > > ShmemVariableCache->nextOid counter to the value that we want in the > > IsBinaryUpgrade-specific function. And then the shutdown checkpoint > > will flush that value to the control file and that is what we want no? > > > > I think that can work. Basically, we need to do something like what > SetNextObjectId() does and then let the shutdown checkpoint update the > actual value in the control file. Right. > > I mean instead of resetwal directly modifying the control file we > > will modify that value in the server using the binary_upgrade function > > and then have that value flush to the disk by shutdown checkpoint. > > > > True, the alternative to consider is to let pg_upgrade update the > control file by itself with the required value of OID. The point I am > slightly worried about doing via server-side function is that some > online and or shutdown checkpoint can update other values in the > control file as well whereas if we do via pg_upgrade, we can have > better control over just updating the OID. Yeah, thats a valid point. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, Oct 13, 2023 at 2:03 PM Dilip Kumar wrote: > > On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila wrote: > > > > > But is this a problem? basically, we will set the > > > ShmemVariableCache->nextOid counter to the value that we want in the > > > IsBinaryUpgrade-specific function. And then the shutdown checkpoint > > > will flush that value to the control file and that is what we want no? > > > > > > > I think that can work. Basically, we need to do something like what > > SetNextObjectId() does and then let the shutdown checkpoint update the > > actual value in the control file. > > Right. > > > > I mean instead of resetwal directly modifying the control file we > > > will modify that value in the server using the binary_upgrade function > > > and then have that value flush to the disk by shutdown checkpoint. > > > > > > > True, the alternative to consider is to let pg_upgrade update the > > control file by itself with the required value of OID. The point I am > > slightly worried about doing via server-side function is that some > > online and or shutdown checkpoint can update other values in the > > control file as well whereas if we do via pg_upgrade, we can have > > better control over just updating the OID. > > Yeah, thats a valid point. > But OTOH, just updating the control file via pg_upgrade may not be sufficient because we should keep the shutdown checkpoint record also updated with that value. See how pg_resetwal achieves it via RewriteControlFile() and WriteEmptyXLOG(). So, considering both the approaches it seems better to go with a server-side function as Robert suggested. -- With Regards, Amit Kapila.
RE: pg_upgrade's interaction with pg_resetwal seems confusing
Dear hackers, > > > > > > I mean instead of resetwal directly modifying the control file we > > > > will modify that value in the server using the binary_upgrade function > > > > and then have that value flush to the disk by shutdown checkpoint. > > > > > > > > > > True, the alternative to consider is to let pg_upgrade update the > > > control file by itself with the required value of OID. The point I am > > > slightly worried about doing via server-side function is that some > > > online and or shutdown checkpoint can update other values in the > > > control file as well whereas if we do via pg_upgrade, we can have > > > better control over just updating the OID. > > > > Yeah, thats a valid point. > > > > But OTOH, just updating the control file via pg_upgrade may not be > sufficient because we should keep the shutdown checkpoint record also > updated with that value. See how pg_resetwal achieves it via > RewriteControlFile() and WriteEmptyXLOG(). So, considering both the > approaches it seems better to go with a server-side function as Robert > suggested. Based on these discussion, I implemented a server-side approach. An attached patch adds a upgrade function which sets ShmemVariableCache->nextOid. It is called at the end of the upgrade. Comments and name of issue_warnings_and_set_wal_level() is also updated because they become outdated. Best Regards, Hayato Kuroda FUJITSU LIMITED 0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch Description: 0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, 13 Oct 2023 at 17:15, Hayato Kuroda (Fujitsu) wrote: > > Dear hackers, > > > > > > > > > I mean instead of resetwal directly modifying the control file we > > > > > will modify that value in the server using the binary_upgrade function > > > > > and then have that value flush to the disk by shutdown checkpoint. > > > > > > > > > > > > > True, the alternative to consider is to let pg_upgrade update the > > > > control file by itself with the required value of OID. The point I am > > > > slightly worried about doing via server-side function is that some > > > > online and or shutdown checkpoint can update other values in the > > > > control file as well whereas if we do via pg_upgrade, we can have > > > > better control over just updating the OID. > > > > > > Yeah, thats a valid point. > > > > > > > But OTOH, just updating the control file via pg_upgrade may not be > > sufficient because we should keep the shutdown checkpoint record also > > updated with that value. See how pg_resetwal achieves it via > > RewriteControlFile() and WriteEmptyXLOG(). So, considering both the > > approaches it seems better to go with a server-side function as Robert > > suggested. > > Based on these discussion, I implemented a server-side approach. An attached > patch > adds a upgrade function which sets ShmemVariableCache->nextOid. It is called > at > the end of the upgrade. Comments and name of > issue_warnings_and_set_wal_level() > is also updated because they become outdated. Few comments: 1) Most of the code in binary_upgrade_set_next_oid is similar to SetNextObjectId, but SetNextObjectId has the error handling to report an error if an invalid nextOid is specified: if (ShmemVariableCache->nextOid > nextOid) elog(ERROR, "too late to advance OID counter to %u, it is now %u", nextOid, ShmemVariableCache->nextOid); Is this check been left out from binary_upgrade_set_next_oid function intentionally? Have you left this because it could be like a dead code. If so, should we have an assert for this here? 2) How about changing issue_warnings_and_set_oid function name to issue_warnings_and_set_next_oid? void -issue_warnings_and_set_wal_level(void) +issue_warnings_and_set_oid(void) { 3) We have removed these comments, is there any change to the rsync instructions? If so we could update the comments accordingly. -* We unconditionally start/stop the new server because pg_resetwal -o set -* wal_level to 'minimum'. If the user is upgrading standby servers using -* the rsync instructions, they will need pg_upgrade to write its final -* WAL record showing wal_level as 'replica'. Regards, Vignesh
RE: pg_upgrade's interaction with pg_resetwal seems confusing
Dear Vignesh, Thank you for reviewing! PSA new version. > > Few comments: > 1) Most of the code in binary_upgrade_set_next_oid is similar to > SetNextObjectId, but SetNextObjectId has the error handling to report > an error if an invalid nextOid is specified: > if (ShmemVariableCache->nextOid > nextOid) > elog(ERROR, "too late to advance OID counter to %u, it is now %u", > nextOid, ShmemVariableCache->nextOid); > > Is this check been left out from binary_upgrade_set_next_oid function > intentionally? Have you left this because it could be like a dead > code. If so, should we have an assert for this here? Yeah, they were removed intentionally, but I did rethink that they could be combined. ereport() would be skipped during the upgrade mode. Thought? Regarding the first ereport(ERROR), it just requires that we are doing initdb. As for the second ereport(ERROR), it requires that next OID is not rollbacked. The restriction seems OK during the initialization, but it is not appropriate for upgrading: wraparound of OID counter might be occurred on old cluster but we try to restore the counter anyway. > 2) How about changing issue_warnings_and_set_oid function name to > issue_warnings_and_set_next_oid? > void > -issue_warnings_and_set_wal_level(void) > +issue_warnings_and_set_oid(void) > { Fixed. > 3) We have removed these comments, is there any change to the rsync > instructions? If so we could update the comments accordingly. > -* We unconditionally start/stop the new server because > pg_resetwal -o set > -* wal_level to 'minimum'. If the user is upgrading standby > servers using > -* the rsync instructions, they will need pg_upgrade to write its > final > -* WAL record showing wal_level as 'replica'. > Hmm, I thought comments for rsync seemed outdated so that removed. I still think this is not needed. Since controlfile->wal_level is not updated to 'minimal' anymore, the unconditional startup is not required for physical standby. [1] : https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=the%20next%20step.-,Run%20rsync,-When%20using%20link Best Regards, Hayato Kuroda FUJITSU LIMITED v2-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch Description: v2-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch
Re: pg_upgrade's interaction with pg_resetwal seems confusing
Note that this patch falsifies the comment in SetNextObjectId that taking the lock is pro forma only -- it no longer is, since in upgrade mode there can be multiple subprocesses running -- so I think it should be updated. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Mon, 16 Oct 2023 at 10:33, Hayato Kuroda (Fujitsu) wrote: > > Dear Vignesh, > > Thank you for reviewing! PSA new version. > > > > > Few comments: > > 1) Most of the code in binary_upgrade_set_next_oid is similar to > > SetNextObjectId, but SetNextObjectId has the error handling to report > > an error if an invalid nextOid is specified: > > if (ShmemVariableCache->nextOid > nextOid) > > elog(ERROR, "too late to advance OID counter to %u, it is now %u", > > nextOid, ShmemVariableCache->nextOid); > > > > Is this check been left out from binary_upgrade_set_next_oid function > > intentionally? Have you left this because it could be like a dead > > code. If so, should we have an assert for this here? > > Yeah, they were removed intentionally, but I did rethink that they could be > combined. ereport() would be skipped during the upgrade mode. Thought? > > Regarding the first ereport(ERROR), it just requires that we are doing initdb. > > As for the second ereport(ERROR), it requires that next OID is not rollbacked. > The restriction seems OK during the initialization, but it is not appropriate > for > upgrading: wraparound of OID counter might be occurred on old cluster but we > try > to restore the counter anyway. > > > 2) How about changing issue_warnings_and_set_oid function name to > > issue_warnings_and_set_next_oid? > > void > > -issue_warnings_and_set_wal_level(void) > > +issue_warnings_and_set_oid(void) > > { > > Fixed. > > > 3) We have removed these comments, is there any change to the rsync > > instructions? If so we could update the comments accordingly. > > -* We unconditionally start/stop the new server because > > pg_resetwal -o set > > -* wal_level to 'minimum'. If the user is upgrading standby > > servers using > > -* the rsync instructions, they will need pg_upgrade to write its > > final > > -* WAL record showing wal_level as 'replica'. > > > > Hmm, I thought comments for rsync seemed outdated so that removed. I still > think > this is not needed. Since controlfile->wal_level is not updated to 'minimal' > anymore, the unconditional startup is not required for physical standby. We can update the commit message with the details of the same, it will help to understand that it is intentionally done. There are couple of typos with the new patch: 1) "uprade logical replication slot" should be "upgrade logical replication slot": Previously, the OID counter is restored by invoking pg_resetwal with the -o option, at the end of upgrade. This is not problematic for now, but WAL removals are not happy if we want to uprade logical replication slot. Therefore, a new upgrade function is introduced to reset next OID. 2) "becasue the value" should be "because the value": Note that we only update the on-memory data to avoid concurrent update of control with the chekpointer. It is harmless becasue the value would be set at shutdown checkpoint. Regards, Vignesh
RE: pg_upgrade's interaction with pg_resetwal seems confusing
Dear Alvaro, Thank you for updating! PSA new version. > Note that this patch falsifies the comment in SetNextObjectId that > taking the lock is pro forma only -- it no longer is, since in upgrade > mode there can be multiple subprocesses running -- so I think it should > be updated. Indeed, some comments were updated. Best Regards, Hayato Kuroda FUJITSU LIMITED v3-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch Description: v3-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch
RE: pg_upgrade's interaction with pg_resetwal seems confusing
Dear Vignesh, Thank you for reviewing! New patch can be available in [1]. > We can update the commit message with the details of the same, it will > help to understand that it is intentionally done. Both comments and a commit message were updated related. > There are couple of typos with the new patch: > 1) "uprade logical replication slot" should be "upgrade logical > replication slot": > Previously, the OID counter is restored by invoking pg_resetwal with the -o > option, at the end of upgrade. This is not problematic for now, but WAL > removals > are not happy if we want to uprade logical replication slot. Therefore, a new > upgrade function is introduced to reset next OID. Fixed. > 2) "becasue the value" should be "because the value": > Note that we only update the on-memory data to avoid concurrent update of > control with the chekpointer. It is harmless becasue the value would be set at > shutdown checkpoint. Fixed. [1]: https://www.postgresql.org/message-id/TYAPR01MB5866DAFE000F8677C49ACD66F5D8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED