Re: Catalog version access

2022-01-31 Thread Nathan Bossart
On Mon, Jan 31, 2022 at 04:57:13PM +0900, Michael Paquier wrote:
> On Tue, Jan 25, 2022 at 01:12:32PM +0900, Michael Paquier wrote:
>> Once you remove the requirement of a running server, we have basically
>> what has been recently implemented with postgres -C for
>> runtime-computed GUCs, because we already go through a read of the
>> control file to be able to print those GUCs with their correct
>> values.  This also means that it is already possible to check if a
>> data folder is compatible with a set of binaries with this facility,
>> as any postgres -C command with a runtime GUC would trigger this
>> check.  Using any of the existing runtime GUCs may be confusing, but
>> that would work.  And I am not really convinced that we have any need
>> to add a specific GUC for this purpose, be it a sort of
>> is_controlfile_valid or controlfile_checksum (CRC32 of the control
>> file).
> 
> Thinking more about this one, we can already do that, so I have
> marked the patch as RwF.  Perhaps we could just add a GUC, but that
> feels a bit dummy.

Sorry, I missed this thread earlier.  You're right, we can just do
something like the following to achieve basically the same result:

postgres -D . -C data_checksums

Unless Vik has any objections, this can probably be marked as Withdrawn.
Perhaps we can look into providing a new option for "postgres" at some
point in the future, but I don't sense a ton of demand at the moment.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Catalog version access

2022-01-30 Thread Michael Paquier
On Tue, Jan 25, 2022 at 01:12:32PM +0900, Michael Paquier wrote:
> Once you remove the requirement of a running server, we have basically
> what has been recently implemented with postgres -C for
> runtime-computed GUCs, because we already go through a read of the
> control file to be able to print those GUCs with their correct
> values.  This also means that it is already possible to check if a
> data folder is compatible with a set of binaries with this facility,
> as any postgres -C command with a runtime GUC would trigger this
> check.  Using any of the existing runtime GUCs may be confusing, but
> that would work.  And I am not really convinced that we have any need
> to add a specific GUC for this purpose, be it a sort of
> is_controlfile_valid or controlfile_checksum (CRC32 of the control
> file).

Thinking more about this one, we can already do that, so I have
marked the patch as RwF.  Perhaps we could just add a GUC, but that
feels a bit dummy.
--
Michael


signature.asc
Description: PGP signature


Re: Catalog version access

2022-01-24 Thread Michael Paquier
On Mon, Jan 24, 2022 at 08:40:08PM +, Bossart, Nathan wrote:
> On 1/23/22, 7:31 PM, "Michael Paquier"  wrote:
>> On Mon, Aug 16, 2021 at 06:12:54PM +, Bossart, Nathan wrote:
>>> I was looking at the --check switch for the postgres binary recently
>>> [0], and this sounds like something that might fit in nicely there.
>>> In the attached patch, --check will also check the control file if one
>>> exists.
>>
>> This would not work on a running postmaster as CreateDataDirLockFile()
>> is called beforehand, but we want this capability, no?
> 
> I was not under the impression this was a requirement, based on the
> use-case discussed upthread [0].  

Hmm.  I got a different impression as of this one:
https://www.postgresql.org/message-id/3496407.1613955...@sss.pgh.pa.us
But I can see downthread that this is not the case.  Sorry for the
confusion.

>> Abusing a bootstrap option for this purpose does not look like a good
>> idea, to be honest, especially for something only used internally now
>> and undocumented, but we want to use something aimed at an external
>> use with some documentation.  Using a separate switch would be more
>> adapted IMO.
> 
> This is the opposite of what Magnus proposed earlier [1].  Do we need
> a new pg_ctl option for this?  It seems like it is really only
> intended for use by PostgreSQL developers, but perhaps there are other
> use-cases I am not thinking of.  In any case, the pg_ctl option would
> probably end up using --check (or another similar mode) behind the
> scenes.

Based on the latest state of the thread, I am understanding that we
don't want a new option for pg_ctl for this feature, and using a
bootstrap's --check for this purpose is not a good idea IMO.  What I
guess from Magnus' suggestion would be to add a completely different
switch. 

Once you remove the requirement of a running server, we have basically
what has been recently implemented with postgres -C for
runtime-computed GUCs, because we already go through a read of the
control file to be able to print those GUCs with their correct
values.  This also means that it is already possible to check if a
data folder is compatible with a set of binaries with this facility,
as any postgres -C command with a runtime GUC would trigger this
check.  Using any of the existing runtime GUCs may be confusing, but
that would work.  And I am not really convinced that we have any need
to add a specific GUC for this purpose, be it a sort of
is_controlfile_valid or controlfile_checksum (CRC32 of the control
file).

>> Also, I think that we should be careful with the read of
>> the control file to avoid false positives?   We can rely on an atomic
>> read/write thanks to its maximum size of 512 bytes, but this looks
>> like a lot what has been done recently with postgres -C for runtime
>> GUCs, that *require* a read of the control file before grabbing the
>> values we are interested in.
> 
> Sorry, I'm not following this one.  In my proposed patch, the control
> file (if one exists) is read after CreateDataDirLockFile(), just like
> PostmasterMain().

While looking at the patch, I was thinking about the fact that we may
want to support the case even if a server is running.  If we don't, my
worries about the concurrent control file activities are moot. 
--
Michael


signature.asc
Description: PGP signature


Re: Catalog version access

2022-01-24 Thread Bossart, Nathan
Thanks for taking a look!

On 1/23/22, 7:31 PM, "Michael Paquier"  wrote:
> On Mon, Aug 16, 2021 at 06:12:54PM +, Bossart, Nathan wrote:
>> I was looking at the --check switch for the postgres binary recently
>> [0], and this sounds like something that might fit in nicely there.
>> In the attached patch, --check will also check the control file if one
>> exists.
>
> This would not work on a running postmaster as CreateDataDirLockFile()
> is called beforehand, but we want this capability, no?

I was not under the impression this was a requirement, based on the
use-case discussed upthread [0].  

> Abusing a bootstrap option for this purpose does not look like a good
> idea, to be honest, especially for something only used internally now
> and undocumented, but we want to use something aimed at an external
> use with some documentation.  Using a separate switch would be more
> adapted IMO.

This is the opposite of what Magnus proposed earlier [1].  Do we need
a new pg_ctl option for this?  It seems like it is really only
intended for use by PostgreSQL developers, but perhaps there are other
use-cases I am not thinking of.  In any case, the pg_ctl option would
probably end up using --check (or another similar mode) behind the
scenes.

> Also, I think that we should be careful with the read of
> the control file to avoid false positives?   We can rely on an atomic
> read/write thanks to its maximum size of 512 bytes, but this looks
> like a lot what has been done recently with postgres -C for runtime
> GUCs, that *require* a read of the control file before grabbing the
> values we are interested in.

Sorry, I'm not following this one.  In my proposed patch, the control
file (if one exists) is read after CreateDataDirLockFile(), just like
PostmasterMain().

Nathan

[0] https://postgr.es/m/20210222022407.ecaygvx2ise6uwyz%40alap3.anarazel.de
[1] 
https://postgr.es/m/CABUevEySovaEDci7c0DXOrV6c7JzWqYzfVwOiRUJxMao%3D9seEw%40mail.gmail.com



Re: Catalog version access

2022-01-23 Thread Michael Paquier
On Mon, Aug 16, 2021 at 06:12:54PM +, Bossart, Nathan wrote:
> I was looking at the --check switch for the postgres binary recently
> [0], and this sounds like something that might fit in nicely there.
> In the attached patch, --check will also check the control file if one
> exists.

This would not work on a running postmaster as CreateDataDirLockFile()
is called beforehand, but we want this capability, no?

Abusing a bootstrap option for this purpose does not look like a good
idea, to be honest, especially for something only used internally now
and undocumented, but we want to use something aimed at an external
use with some documentation.  Using a separate switch would be more
adapted IMO.  Also, I think that we should be careful with the read of
the control file to avoid false positives?   We can rely on an atomic
read/write thanks to its maximum size of 512 bytes, but this looks
like a lot what has been done recently with postgres -C for runtime
GUCs, that *require* a read of the control file before grabbing the
values we are interested in.
--
Michael


signature.asc
Description: PGP signature


Re: Catalog version access

2021-04-08 Thread Magnus Hagander
On Wed, Mar 3, 2021 at 8:16 PM Tom Lane  wrote:
>
> Vik Fearing  writes:
> > On 3/3/21 6:35 PM, Peter Eisentraut wrote:
> >> If what you want to know is whether a given binary can run against a
> >> given data directory then CATALOG_VERSION_NO isn't the only thing you
> >> need to check.  The full truth of this is in ReadControlFile().  The
> >> best way to get that answer is to start a server and see if it
> >> complains.  You can even grep the log for "It looks like you need to
> >> initdb.".
>
> > In that case, what would everyone think about a  `pg_ctl check`  option?
>
> The trouble with Peter's recipe is that it doesn't work if there is
> already a server instance running there (or at least I think we'll
> bitch about the existing postmaster first, maybe I'm wrong).  Now,
> that's not such a big problem for the use-case you were describing.
> But I bet if we expose this method as an apparently-general-purpose
> pg_ctl option, there'll be complaints.

Another option could be to provide a switch to the postmaster binary.
Using pg_config as originally suggested is risky because you might
pick up the wrong postmaster, but if you put it on the actual
postmaster binary you certainly know which one you're on... As this is
something that's primarily of interest to developers, it's also a bit
lower weight than having a "heavy" solution like an entire mode for
pg_ctl.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Catalog version access

2021-03-03 Thread Tom Lane
Vik Fearing  writes:
> On 3/3/21 6:35 PM, Peter Eisentraut wrote:
>> If what you want to know is whether a given binary can run against a
>> given data directory then CATALOG_VERSION_NO isn't the only thing you
>> need to check.  The full truth of this is in ReadControlFile().  The
>> best way to get that answer is to start a server and see if it
>> complains.  You can even grep the log for "It looks like you need to
>> initdb.".

> In that case, what would everyone think about a  `pg_ctl check`  option?

The trouble with Peter's recipe is that it doesn't work if there is
already a server instance running there (or at least I think we'll
bitch about the existing postmaster first, maybe I'm wrong).  Now,
that's not such a big problem for the use-case you were describing.
But I bet if we expose this method as an apparently-general-purpose
pg_ctl option, there'll be complaints.

regards, tom lane




Re: Catalog version access

2021-03-03 Thread Vik Fearing
On 3/3/21 6:35 PM, Peter Eisentraut wrote:
> On 22.02.21 08:00, Vik Fearing wrote:
>> On 2/22/21 3:24 AM, Andres Freund wrote:
>>> Imagine trying to run regular tests of HEAD, where the tests require a
>>> large database to be loaded. Re-loading the data for every [few] commits
>>> is prohibitively time consuming, and even just running pg_upgrade is
>>> painful. So you'd like to re-use a "template" data directory with the
>>> data loaded if possible (i.e. no catversion / WAL / ... version bumps),
>>> and a pg_upgrade otherwise.
>>
>> This is exactly what I am doing.
> 
> If what you want to know is whether a given binary can run against a
> given data directory then CATALOG_VERSION_NO isn't the only thing you
> need to check.  The full truth of this is in ReadControlFile().  The
> best way to get that answer is to start a server and see if it
> complains.  You can even grep the log for "It looks like you need to
> initdb.".

In that case, what would everyone think about a  `pg_ctl check`  option?
-- 
Vik Fearing




Re: Catalog version access

2021-03-03 Thread Peter Eisentraut

On 22.02.21 08:00, Vik Fearing wrote:

On 2/22/21 3:24 AM, Andres Freund wrote:

Imagine trying to run regular tests of HEAD, where the tests require a
large database to be loaded. Re-loading the data for every [few] commits
is prohibitively time consuming, and even just running pg_upgrade is
painful. So you'd like to re-use a "template" data directory with the
data loaded if possible (i.e. no catversion / WAL / ... version bumps),
and a pg_upgrade otherwise.


This is exactly what I am doing.


If what you want to know is whether a given binary can run against a 
given data directory then CATALOG_VERSION_NO isn't the only thing you 
need to check.  The full truth of this is in ReadControlFile().  The 
best way to get that answer is to start a server and see if it 
complains.  You can even grep the log for "It looks like you need to 
initdb.".





Re: Catalog version access

2021-02-21 Thread Vik Fearing
On 2/22/21 3:24 AM, Andres Freund wrote:
> Imagine trying to run regular tests of HEAD, where the tests require a
> large database to be loaded. Re-loading the data for every [few] commits
> is prohibitively time consuming, and even just running pg_upgrade is
> painful. So you'd like to re-use a "template" data directory with the
> data loaded if possible (i.e. no catversion / WAL / ... version bumps),
> and a pg_upgrade otherwise.

This is exactly what I am doing.
-- 
Vik Fearing




Re: Catalog version access

2021-02-21 Thread Andres Freund
Hi,

On 2021-02-21 20:53:52 -0500, Tom Lane wrote:
> Andres Freund  writes:
> >> If we're going to bother with providing a way
> >> to get this info, we should make it possible to ask the running server.
> 
> > In Vik's case there is no running server to ask, IIUC.
> 
> Hm.  If you're about to initdb or start the server then there's more
> reason to think you can find a matching pg_config.  Still, pg_config
> is not going to tell you what is actually in the data directory, so
> it's not clear to me how it helps with "do I need to initdb?".

Imagine trying to run regular tests of HEAD, where the tests require a
large database to be loaded. Re-loading the data for every [few] commits
is prohibitively time consuming, and even just running pg_upgrade is
painful. So you'd like to re-use a "template" data directory with the
data loaded if possible (i.e. no catversion / WAL / ... version bumps),
and a pg_upgrade otherwise.

In such a situation it's easy to access the catalog version for the
existing data directory (pg_controldata, or pg_control_system()) - but
there's no convenient way to figure out what the catversion of the
to-be-tested version will be.  Vik's approach to figuring that out was
initdb'ing a throw-away data directory, using pg_controldata, and
discarding that data directory - not pretty.  There's no version skew
issue here as far as I can tell, given he's initdbing freshly anyway.

The only argument I see against such an option is that arguably just
grepping for the information in the headers isn't too hard. But that's
then something everybody has to do, there's the issue of plain unix
commands not working on windows, etc.

Greetings,

Andres Freund




Re: Catalog version access

2021-02-21 Thread Tom Lane
Andres Freund  writes:
> On 2021-02-21 19:54:01 -0500, Tom Lane wrote:
>> FWIW, I think asking pg_config about this is a guaranteed way of having
>> version-skew-like bugs.

> Could you elaborate a bit?

How do you know that the pg_config you found has anything to do with the
server you're connected to?

>> If we're going to bother with providing a way
>> to get this info, we should make it possible to ask the running server.

> In Vik's case there is no running server to ask, IIUC.

Hm.  If you're about to initdb or start the server then there's more
reason to think you can find a matching pg_config.  Still, pg_config
is not going to tell you what is actually in the data directory, so
it's not clear to me how it helps with "do I need to initdb?".

> He has an
> existing cluster running an "old" set of binaries, and a set of
> binaries. He wants to know if he needs to pg_upgrade, or can start from
> a basebackup. The old version he can get from pg_controldata, or the
> catalog. But except for initdb'ing a throw-away cluster, there's no way
> to get that for the new cluster that doesn't involve grepping headers...

For production cases it'd be sufficient to compare pg_config --version
output.  I suppose if you want to automate this for development versions
you'd wish for something more fine-grained ... but I think you can
already use configure's --with-extra-version to stick catversion or
whatever into the --version string.

regards, tom lane




Re: Catalog version access

2021-02-21 Thread Andres Freund
Hi,

On 2021-02-21 19:54:01 -0500, Tom Lane wrote:
> Vik Fearing  writes:
> > On 2/22/21 12:48 AM, Andres Freund wrote:
> >> Seems roughly reasonable. Although I wonder if we rather should make it
> >> something more generic than just catversion? E.g. a wal page magic bump
> >> will also require a dump/restore or pg_upgrade, but won't be detected by
> >> just doing this. So perhaps we should instead add a pg_config option
> >> showing all the different versions that influence on-disk compatibility?
> 
> > Do you mean one single thing somehow lumped together, or one for each
> > version number?
> 
> FWIW, I think asking pg_config about this is a guaranteed way of having
> version-skew-like bugs.

Could you elaborate a bit?


> If we're going to bother with providing a way
> to get this info, we should make it possible to ask the running server.

In Vik's case there is no running server to ask, IIUC. He has an
existing cluster running an "old" set of binaries, and a set of
binaries. He wants to know if he needs to pg_upgrade, or can start from
a basebackup. The old version he can get from pg_controldata, or the
catalog. But except for initdb'ing a throw-away cluster, there's no way
to get that for the new cluster that doesn't involve grepping headers...


> (That would open up some security questions: do we want to let
> unprivileged users know this info?  I guess if version() is not
> protected then this needn't be either.)

I don't see a reason it'd need to be protected. Furthermore, the ship
has sailed:
SELECT catalog_version_no FROM pg_control_system();

Greetings,

Andres Freund




Re: Catalog version access

2021-02-21 Thread Tom Lane
Vik Fearing  writes:
> On 2/22/21 12:48 AM, Andres Freund wrote:
>> Seems roughly reasonable. Although I wonder if we rather should make it
>> something more generic than just catversion? E.g. a wal page magic bump
>> will also require a dump/restore or pg_upgrade, but won't be detected by
>> just doing this. So perhaps we should instead add a pg_config option
>> showing all the different versions that influence on-disk compatibility?

> Do you mean one single thing somehow lumped together, or one for each
> version number?

FWIW, I think asking pg_config about this is a guaranteed way of having
version-skew-like bugs.  If we're going to bother with providing a way
to get this info, we should make it possible to ask the running server.

(That would open up some security questions: do we want to let
unprivileged users know this info?  I guess if version() is not
protected then this needn't be either.)

regards, tom lane




Re: Catalog version access

2021-02-21 Thread Euler Taveira
On Sun, Feb 21, 2021, at 8:15 PM, Vik Fearing wrote:
> and a second patch that adds a read-only guc to get at it on the SQL
> level using SHOW catalog_version; or similar.  I need that because I
> also do a dump of pg_settings and I would like for it to appear there.
The catalog version number is already available in pg_control_system().

postgres=# select * from pg_control_system();
-[ RECORD 1 ]+---
pg_control_version   | 1300
catalog_version_no   | 202007201
system_identifier| 6931867587550812316
pg_control_last_modified | 2021-02-21 20:59:06-03


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Catalog version access

2021-02-21 Thread Vik Fearing
On 2/22/21 12:48 AM, Andres Freund wrote:
> Hi,
> 
> On 2021-02-22 00:15:20 +0100, Vik Fearing wrote:
>> I do some very regular testing on HEAD and my scripts need to know if
>> the catalog version has changed to determine if it needs to pg_restore
>> or if a basebackup is okay.  In order to get it, I have to do this:
>>
>>
>> # Get the catalog version (there is no better way to do this)
>> tmp=$(mktemp --directory)
>> $bin/initdb --pgdata=$tmp
>> catversion=$($bin/pg_controldata $tmp | grep "Catalog version" \
>>  | cut --delimiter=: --fields=2 | xargs)
>> rm --recursive --force $tmp
> 
> That's a pretty heavy way to do it.

That's what I thought, too!

> If you have access to pg_config, you
> could just do
> grep '^#define CATALOG_VER' $(pg_config 
> --includedir)/server/catalog/catversion.h|awk '{print $3}'

Oh thanks.  That's much better.

>> I find this less than attractive, especially since the catalog version
>> is a property of the binaries and not the data directory.  Attached is a
>> patchset so that the above can become simply:
>>
>> catversion=$($bin/pg_config --catversion)
> 
> Seems roughly reasonable. Although I wonder if we rather should make it
> something more generic than just catversion? E.g. a wal page magic bump
> will also require a dump/restore or pg_upgrade, but won't be detected by
> just doing this. So perhaps we should instead add a pg_config option
> showing all the different versions that influence on-disk compatibility?

Do you mean one single thing somehow lumped together, or one for each
version number?
-- 
Vik Fearing




Re: Catalog version access

2021-02-21 Thread Andres Freund
Hi,

On 2021-02-22 00:15:20 +0100, Vik Fearing wrote:
> I do some very regular testing on HEAD and my scripts need to know if
> the catalog version has changed to determine if it needs to pg_restore
> or if a basebackup is okay.  In order to get it, I have to do this:
> 
> 
> # Get the catalog version (there is no better way to do this)
> tmp=$(mktemp --directory)
> $bin/initdb --pgdata=$tmp
> catversion=$($bin/pg_controldata $tmp | grep "Catalog version" \
>  | cut --delimiter=: --fields=2 | xargs)
> rm --recursive --force $tmp

That's a pretty heavy way to do it. If you have access to pg_config, you
could just do
grep '^#define CATALOG_VER' $(pg_config 
--includedir)/server/catalog/catversion.h|awk '{print $3}'


> I find this less than attractive, especially since the catalog version
> is a property of the binaries and not the data directory.  Attached is a
> patchset so that the above can become simply:
> 
> catversion=$($bin/pg_config --catversion)

Seems roughly reasonable. Although I wonder if we rather should make it
something more generic than just catversion? E.g. a wal page magic bump
will also require a dump/restore or pg_upgrade, but won't be detected by
just doing this. So perhaps we should instead add a pg_config option
showing all the different versions that influence on-disk compatibility?

Greetings,

Andres Freund