Re: Catalog version access
On Mon, Jan 31, 2022 at 04:57:13PM +0900, Michael Paquier wrote: > On Tue, Jan 25, 2022 at 01:12:32PM +0900, Michael Paquier wrote: >> Once you remove the requirement of a running server, we have basically >> what has been recently implemented with postgres -C for >> runtime-computed GUCs, because we already go through a read of the >> control file to be able to print those GUCs with their correct >> values. This also means that it is already possible to check if a >> data folder is compatible with a set of binaries with this facility, >> as any postgres -C command with a runtime GUC would trigger this >> check. Using any of the existing runtime GUCs may be confusing, but >> that would work. And I am not really convinced that we have any need >> to add a specific GUC for this purpose, be it a sort of >> is_controlfile_valid or controlfile_checksum (CRC32 of the control >> file). > > Thinking more about this one, we can already do that, so I have > marked the patch as RwF. Perhaps we could just add a GUC, but that > feels a bit dummy. Sorry, I missed this thread earlier. You're right, we can just do something like the following to achieve basically the same result: postgres -D . -C data_checksums Unless Vik has any objections, this can probably be marked as Withdrawn. Perhaps we can look into providing a new option for "postgres" at some point in the future, but I don't sense a ton of demand at the moment. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Catalog version access
On Tue, Jan 25, 2022 at 01:12:32PM +0900, Michael Paquier wrote: > Once you remove the requirement of a running server, we have basically > what has been recently implemented with postgres -C for > runtime-computed GUCs, because we already go through a read of the > control file to be able to print those GUCs with their correct > values. This also means that it is already possible to check if a > data folder is compatible with a set of binaries with this facility, > as any postgres -C command with a runtime GUC would trigger this > check. Using any of the existing runtime GUCs may be confusing, but > that would work. And I am not really convinced that we have any need > to add a specific GUC for this purpose, be it a sort of > is_controlfile_valid or controlfile_checksum (CRC32 of the control > file). Thinking more about this one, we can already do that, so I have marked the patch as RwF. Perhaps we could just add a GUC, but that feels a bit dummy. -- Michael signature.asc Description: PGP signature
Re: Catalog version access
On Mon, Jan 24, 2022 at 08:40:08PM +, Bossart, Nathan wrote: > On 1/23/22, 7:31 PM, "Michael Paquier" wrote: >> On Mon, Aug 16, 2021 at 06:12:54PM +, Bossart, Nathan wrote: >>> I was looking at the --check switch for the postgres binary recently >>> [0], and this sounds like something that might fit in nicely there. >>> In the attached patch, --check will also check the control file if one >>> exists. >> >> This would not work on a running postmaster as CreateDataDirLockFile() >> is called beforehand, but we want this capability, no? > > I was not under the impression this was a requirement, based on the > use-case discussed upthread [0]. Hmm. I got a different impression as of this one: https://www.postgresql.org/message-id/3496407.1613955...@sss.pgh.pa.us But I can see downthread that this is not the case. Sorry for the confusion. >> Abusing a bootstrap option for this purpose does not look like a good >> idea, to be honest, especially for something only used internally now >> and undocumented, but we want to use something aimed at an external >> use with some documentation. Using a separate switch would be more >> adapted IMO. > > This is the opposite of what Magnus proposed earlier [1]. Do we need > a new pg_ctl option for this? It seems like it is really only > intended for use by PostgreSQL developers, but perhaps there are other > use-cases I am not thinking of. In any case, the pg_ctl option would > probably end up using --check (or another similar mode) behind the > scenes. Based on the latest state of the thread, I am understanding that we don't want a new option for pg_ctl for this feature, and using a bootstrap's --check for this purpose is not a good idea IMO. What I guess from Magnus' suggestion would be to add a completely different switch. Once you remove the requirement of a running server, we have basically what has been recently implemented with postgres -C for runtime-computed GUCs, because we already go through a read of the control file to be able to print those GUCs with their correct values. This also means that it is already possible to check if a data folder is compatible with a set of binaries with this facility, as any postgres -C command with a runtime GUC would trigger this check. Using any of the existing runtime GUCs may be confusing, but that would work. And I am not really convinced that we have any need to add a specific GUC for this purpose, be it a sort of is_controlfile_valid or controlfile_checksum (CRC32 of the control file). >> Also, I think that we should be careful with the read of >> the control file to avoid false positives? We can rely on an atomic >> read/write thanks to its maximum size of 512 bytes, but this looks >> like a lot what has been done recently with postgres -C for runtime >> GUCs, that *require* a read of the control file before grabbing the >> values we are interested in. > > Sorry, I'm not following this one. In my proposed patch, the control > file (if one exists) is read after CreateDataDirLockFile(), just like > PostmasterMain(). While looking at the patch, I was thinking about the fact that we may want to support the case even if a server is running. If we don't, my worries about the concurrent control file activities are moot. -- Michael signature.asc Description: PGP signature
Re: Catalog version access
Thanks for taking a look! On 1/23/22, 7:31 PM, "Michael Paquier" wrote: > On Mon, Aug 16, 2021 at 06:12:54PM +, Bossart, Nathan wrote: >> I was looking at the --check switch for the postgres binary recently >> [0], and this sounds like something that might fit in nicely there. >> In the attached patch, --check will also check the control file if one >> exists. > > This would not work on a running postmaster as CreateDataDirLockFile() > is called beforehand, but we want this capability, no? I was not under the impression this was a requirement, based on the use-case discussed upthread [0]. > Abusing a bootstrap option for this purpose does not look like a good > idea, to be honest, especially for something only used internally now > and undocumented, but we want to use something aimed at an external > use with some documentation. Using a separate switch would be more > adapted IMO. This is the opposite of what Magnus proposed earlier [1]. Do we need a new pg_ctl option for this? It seems like it is really only intended for use by PostgreSQL developers, but perhaps there are other use-cases I am not thinking of. In any case, the pg_ctl option would probably end up using --check (or another similar mode) behind the scenes. > Also, I think that we should be careful with the read of > the control file to avoid false positives? We can rely on an atomic > read/write thanks to its maximum size of 512 bytes, but this looks > like a lot what has been done recently with postgres -C for runtime > GUCs, that *require* a read of the control file before grabbing the > values we are interested in. Sorry, I'm not following this one. In my proposed patch, the control file (if one exists) is read after CreateDataDirLockFile(), just like PostmasterMain(). Nathan [0] https://postgr.es/m/20210222022407.ecaygvx2ise6uwyz%40alap3.anarazel.de [1] https://postgr.es/m/CABUevEySovaEDci7c0DXOrV6c7JzWqYzfVwOiRUJxMao%3D9seEw%40mail.gmail.com
Re: Catalog version access
On Mon, Aug 16, 2021 at 06:12:54PM +, Bossart, Nathan wrote: > I was looking at the --check switch for the postgres binary recently > [0], and this sounds like something that might fit in nicely there. > In the attached patch, --check will also check the control file if one > exists. This would not work on a running postmaster as CreateDataDirLockFile() is called beforehand, but we want this capability, no? Abusing a bootstrap option for this purpose does not look like a good idea, to be honest, especially for something only used internally now and undocumented, but we want to use something aimed at an external use with some documentation. Using a separate switch would be more adapted IMO. Also, I think that we should be careful with the read of the control file to avoid false positives? We can rely on an atomic read/write thanks to its maximum size of 512 bytes, but this looks like a lot what has been done recently with postgres -C for runtime GUCs, that *require* a read of the control file before grabbing the values we are interested in. -- Michael signature.asc Description: PGP signature
Re: Catalog version access
On Wed, Mar 3, 2021 at 8:16 PM Tom Lane wrote: > > Vik Fearing writes: > > On 3/3/21 6:35 PM, Peter Eisentraut wrote: > >> If what you want to know is whether a given binary can run against a > >> given data directory then CATALOG_VERSION_NO isn't the only thing you > >> need to check. The full truth of this is in ReadControlFile(). The > >> best way to get that answer is to start a server and see if it > >> complains. You can even grep the log for "It looks like you need to > >> initdb.". > > > In that case, what would everyone think about a `pg_ctl check` option? > > The trouble with Peter's recipe is that it doesn't work if there is > already a server instance running there (or at least I think we'll > bitch about the existing postmaster first, maybe I'm wrong). Now, > that's not such a big problem for the use-case you were describing. > But I bet if we expose this method as an apparently-general-purpose > pg_ctl option, there'll be complaints. Another option could be to provide a switch to the postmaster binary. Using pg_config as originally suggested is risky because you might pick up the wrong postmaster, but if you put it on the actual postmaster binary you certainly know which one you're on... As this is something that's primarily of interest to developers, it's also a bit lower weight than having a "heavy" solution like an entire mode for pg_ctl. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Catalog version access
Vik Fearing writes: > On 3/3/21 6:35 PM, Peter Eisentraut wrote: >> If what you want to know is whether a given binary can run against a >> given data directory then CATALOG_VERSION_NO isn't the only thing you >> need to check. The full truth of this is in ReadControlFile(). The >> best way to get that answer is to start a server and see if it >> complains. You can even grep the log for "It looks like you need to >> initdb.". > In that case, what would everyone think about a `pg_ctl check` option? The trouble with Peter's recipe is that it doesn't work if there is already a server instance running there (or at least I think we'll bitch about the existing postmaster first, maybe I'm wrong). Now, that's not such a big problem for the use-case you were describing. But I bet if we expose this method as an apparently-general-purpose pg_ctl option, there'll be complaints. regards, tom lane
Re: Catalog version access
On 3/3/21 6:35 PM, Peter Eisentraut wrote: > On 22.02.21 08:00, Vik Fearing wrote: >> On 2/22/21 3:24 AM, Andres Freund wrote: >>> Imagine trying to run regular tests of HEAD, where the tests require a >>> large database to be loaded. Re-loading the data for every [few] commits >>> is prohibitively time consuming, and even just running pg_upgrade is >>> painful. So you'd like to re-use a "template" data directory with the >>> data loaded if possible (i.e. no catversion / WAL / ... version bumps), >>> and a pg_upgrade otherwise. >> >> This is exactly what I am doing. > > If what you want to know is whether a given binary can run against a > given data directory then CATALOG_VERSION_NO isn't the only thing you > need to check. The full truth of this is in ReadControlFile(). The > best way to get that answer is to start a server and see if it > complains. You can even grep the log for "It looks like you need to > initdb.". In that case, what would everyone think about a `pg_ctl check` option? -- Vik Fearing
Re: Catalog version access
On 22.02.21 08:00, Vik Fearing wrote: On 2/22/21 3:24 AM, Andres Freund wrote: Imagine trying to run regular tests of HEAD, where the tests require a large database to be loaded. Re-loading the data for every [few] commits is prohibitively time consuming, and even just running pg_upgrade is painful. So you'd like to re-use a "template" data directory with the data loaded if possible (i.e. no catversion / WAL / ... version bumps), and a pg_upgrade otherwise. This is exactly what I am doing. If what you want to know is whether a given binary can run against a given data directory then CATALOG_VERSION_NO isn't the only thing you need to check. The full truth of this is in ReadControlFile(). The best way to get that answer is to start a server and see if it complains. You can even grep the log for "It looks like you need to initdb.".
Re: Catalog version access
On 2/22/21 3:24 AM, Andres Freund wrote: > Imagine trying to run regular tests of HEAD, where the tests require a > large database to be loaded. Re-loading the data for every [few] commits > is prohibitively time consuming, and even just running pg_upgrade is > painful. So you'd like to re-use a "template" data directory with the > data loaded if possible (i.e. no catversion / WAL / ... version bumps), > and a pg_upgrade otherwise. This is exactly what I am doing. -- Vik Fearing
Re: Catalog version access
Hi, On 2021-02-21 20:53:52 -0500, Tom Lane wrote: > Andres Freund writes: > >> If we're going to bother with providing a way > >> to get this info, we should make it possible to ask the running server. > > > In Vik's case there is no running server to ask, IIUC. > > Hm. If you're about to initdb or start the server then there's more > reason to think you can find a matching pg_config. Still, pg_config > is not going to tell you what is actually in the data directory, so > it's not clear to me how it helps with "do I need to initdb?". Imagine trying to run regular tests of HEAD, where the tests require a large database to be loaded. Re-loading the data for every [few] commits is prohibitively time consuming, and even just running pg_upgrade is painful. So you'd like to re-use a "template" data directory with the data loaded if possible (i.e. no catversion / WAL / ... version bumps), and a pg_upgrade otherwise. In such a situation it's easy to access the catalog version for the existing data directory (pg_controldata, or pg_control_system()) - but there's no convenient way to figure out what the catversion of the to-be-tested version will be. Vik's approach to figuring that out was initdb'ing a throw-away data directory, using pg_controldata, and discarding that data directory - not pretty. There's no version skew issue here as far as I can tell, given he's initdbing freshly anyway. The only argument I see against such an option is that arguably just grepping for the information in the headers isn't too hard. But that's then something everybody has to do, there's the issue of plain unix commands not working on windows, etc. Greetings, Andres Freund
Re: Catalog version access
Andres Freund writes: > On 2021-02-21 19:54:01 -0500, Tom Lane wrote: >> FWIW, I think asking pg_config about this is a guaranteed way of having >> version-skew-like bugs. > Could you elaborate a bit? How do you know that the pg_config you found has anything to do with the server you're connected to? >> If we're going to bother with providing a way >> to get this info, we should make it possible to ask the running server. > In Vik's case there is no running server to ask, IIUC. Hm. If you're about to initdb or start the server then there's more reason to think you can find a matching pg_config. Still, pg_config is not going to tell you what is actually in the data directory, so it's not clear to me how it helps with "do I need to initdb?". > He has an > existing cluster running an "old" set of binaries, and a set of > binaries. He wants to know if he needs to pg_upgrade, or can start from > a basebackup. The old version he can get from pg_controldata, or the > catalog. But except for initdb'ing a throw-away cluster, there's no way > to get that for the new cluster that doesn't involve grepping headers... For production cases it'd be sufficient to compare pg_config --version output. I suppose if you want to automate this for development versions you'd wish for something more fine-grained ... but I think you can already use configure's --with-extra-version to stick catversion or whatever into the --version string. regards, tom lane
Re: Catalog version access
Hi, On 2021-02-21 19:54:01 -0500, Tom Lane wrote: > Vik Fearing writes: > > On 2/22/21 12:48 AM, Andres Freund wrote: > >> Seems roughly reasonable. Although I wonder if we rather should make it > >> something more generic than just catversion? E.g. a wal page magic bump > >> will also require a dump/restore or pg_upgrade, but won't be detected by > >> just doing this. So perhaps we should instead add a pg_config option > >> showing all the different versions that influence on-disk compatibility? > > > Do you mean one single thing somehow lumped together, or one for each > > version number? > > FWIW, I think asking pg_config about this is a guaranteed way of having > version-skew-like bugs. Could you elaborate a bit? > If we're going to bother with providing a way > to get this info, we should make it possible to ask the running server. In Vik's case there is no running server to ask, IIUC. He has an existing cluster running an "old" set of binaries, and a set of binaries. He wants to know if he needs to pg_upgrade, or can start from a basebackup. The old version he can get from pg_controldata, or the catalog. But except for initdb'ing a throw-away cluster, there's no way to get that for the new cluster that doesn't involve grepping headers... > (That would open up some security questions: do we want to let > unprivileged users know this info? I guess if version() is not > protected then this needn't be either.) I don't see a reason it'd need to be protected. Furthermore, the ship has sailed: SELECT catalog_version_no FROM pg_control_system(); Greetings, Andres Freund
Re: Catalog version access
Vik Fearing writes: > On 2/22/21 12:48 AM, Andres Freund wrote: >> Seems roughly reasonable. Although I wonder if we rather should make it >> something more generic than just catversion? E.g. a wal page magic bump >> will also require a dump/restore or pg_upgrade, but won't be detected by >> just doing this. So perhaps we should instead add a pg_config option >> showing all the different versions that influence on-disk compatibility? > Do you mean one single thing somehow lumped together, or one for each > version number? FWIW, I think asking pg_config about this is a guaranteed way of having version-skew-like bugs. If we're going to bother with providing a way to get this info, we should make it possible to ask the running server. (That would open up some security questions: do we want to let unprivileged users know this info? I guess if version() is not protected then this needn't be either.) regards, tom lane
Re: Catalog version access
On Sun, Feb 21, 2021, at 8:15 PM, Vik Fearing wrote: > and a second patch that adds a read-only guc to get at it on the SQL > level using SHOW catalog_version; or similar. I need that because I > also do a dump of pg_settings and I would like for it to appear there. The catalog version number is already available in pg_control_system(). postgres=# select * from pg_control_system(); -[ RECORD 1 ]+--- pg_control_version | 1300 catalog_version_no | 202007201 system_identifier| 6931867587550812316 pg_control_last_modified | 2021-02-21 20:59:06-03 -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Catalog version access
On 2/22/21 12:48 AM, Andres Freund wrote: > Hi, > > On 2021-02-22 00:15:20 +0100, Vik Fearing wrote: >> I do some very regular testing on HEAD and my scripts need to know if >> the catalog version has changed to determine if it needs to pg_restore >> or if a basebackup is okay. In order to get it, I have to do this: >> >> >> # Get the catalog version (there is no better way to do this) >> tmp=$(mktemp --directory) >> $bin/initdb --pgdata=$tmp >> catversion=$($bin/pg_controldata $tmp | grep "Catalog version" \ >> | cut --delimiter=: --fields=2 | xargs) >> rm --recursive --force $tmp > > That's a pretty heavy way to do it. That's what I thought, too! > If you have access to pg_config, you > could just do > grep '^#define CATALOG_VER' $(pg_config > --includedir)/server/catalog/catversion.h|awk '{print $3}' Oh thanks. That's much better. >> I find this less than attractive, especially since the catalog version >> is a property of the binaries and not the data directory. Attached is a >> patchset so that the above can become simply: >> >> catversion=$($bin/pg_config --catversion) > > Seems roughly reasonable. Although I wonder if we rather should make it > something more generic than just catversion? E.g. a wal page magic bump > will also require a dump/restore or pg_upgrade, but won't be detected by > just doing this. So perhaps we should instead add a pg_config option > showing all the different versions that influence on-disk compatibility? Do you mean one single thing somehow lumped together, or one for each version number? -- Vik Fearing
Re: Catalog version access
Hi, On 2021-02-22 00:15:20 +0100, Vik Fearing wrote: > I do some very regular testing on HEAD and my scripts need to know if > the catalog version has changed to determine if it needs to pg_restore > or if a basebackup is okay. In order to get it, I have to do this: > > > # Get the catalog version (there is no better way to do this) > tmp=$(mktemp --directory) > $bin/initdb --pgdata=$tmp > catversion=$($bin/pg_controldata $tmp | grep "Catalog version" \ > | cut --delimiter=: --fields=2 | xargs) > rm --recursive --force $tmp That's a pretty heavy way to do it. If you have access to pg_config, you could just do grep '^#define CATALOG_VER' $(pg_config --includedir)/server/catalog/catversion.h|awk '{print $3}' > I find this less than attractive, especially since the catalog version > is a property of the binaries and not the data directory. Attached is a > patchset so that the above can become simply: > > catversion=$($bin/pg_config --catversion) Seems roughly reasonable. Although I wonder if we rather should make it something more generic than just catversion? E.g. a wal page magic bump will also require a dump/restore or pg_upgrade, but won't be detected by just doing this. So perhaps we should instead add a pg_config option showing all the different versions that influence on-disk compatibility? Greetings, Andres Freund