Re: Can we let extensions change their dumped catalog schemas?

2023-03-08 Thread Jacob Champion
On Tue, Feb 7, 2023 at 10:16 AM Jacob Champion  wrote:
> Even more concretely, here's a draft patch. There's no nuance --
> setting dump_version affects all past versions of the extension, and
> it can't be turned off at restore time. So extensions opting in would
> have to be written to be side-by-side installable. (Ours is, in its
> own way, but the PGDG installers don't allow it -- which maybe
> highlights a weakness in this approach.) I'm still not sure if this
> addresses Tom's concerns, or just adds new ones.

Any general thoughts on this approach? I don't think it's baked enough
for registration yet, but I also don't know what approach would be
better.

Given the recent chatter around extension versions in other threads
[1, 2], I feel like there is a big gap between the Postgres core
expectations and what extension authors are actually doing when it
comes to handling version upgrades. I'd like to chip away at that,
somehow.

Thanks,
--Jacob

[1] https://www.postgresql.org/message-id/212074.1678301349%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/CA%2BTgmoYqK6nfP15SjuyO6t5jOmymG%3DqO7JOOVJdTOj96L0XJ1Q%40mail.gmail.com




Re: Can we let extensions change their dumped catalog schemas?

2023-02-07 Thread Jacob Champion
On Tue, Jan 17, 2023 at 3:18 PM Jacob Champion  wrote:
> As a concrete example, Timescale's extension control file could look
> like this:
>
> default_version = '2.x.y'
> module_pathname = '$libdir/timescaledb-2.x.y'
> ...
> dump_version = true
>
> which would then cause pg_dump to issue a VERSION for its CREATE
> EXTENSION line. Other extensions would remain with the default
> (dump_version = false), so they'd be dumped without an explicit VERSION.

Even more concretely, here's a draft patch. There's no nuance --
setting dump_version affects all past versions of the extension, and
it can't be turned off at restore time. So extensions opting in would
have to be written to be side-by-side installable. (Ours is, in its
own way, but the PGDG installers don't allow it -- which maybe
highlights a weakness in this approach.) I'm still not sure if this
addresses Tom's concerns, or just adds new ones.

We could maybe give the user more control by overriding the default
version for an extension in a different TOC entry, and then add
options to ignore or include those version numbers during restore.
That doesn't address the side-by-side problem directly but gives an
escape hatch.

Earlier I wrote:

> I'm curious about your
> opinion on option 3, since it would naively seem to make pg_dump do
> _less_ work for a given extension.

This was definitely naive :D We can't just make use of the
binary-upgrade machinery to dump extension internals, because it pins
OIDs. So that might still be a valid approach, but it's not "less
work."

Thanks,
--Jacob
From 1a79eacbc6bdad8bb5e05418756d6ab35e9bab9e Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Fri, 27 Jan 2023 10:42:43 -0800
Subject: [PATCH] WIP: implement dump_version control option

The intent is to allow extensions to tie their internal catalog version
to a dump. This allows any dumped catalog tables to be upgraded after a
restore. The downside is that both the old and new extension versions
must be available for installation on the target.

This passes manual sanity checks, but there is no test yet.
---
 src/backend/commands/extension.c | 29 +
 src/bin/pg_dump/pg_dump.c| 25 ++---
 src/bin/pg_dump/pg_dump.h|  1 +
 src/include/catalog/pg_proc.dat  |  4 
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b1509cc505..b58c1d27a1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -88,6 +88,7 @@ typedef struct ExtensionControlFile
 	bool		relocatable;	/* is ALTER EXTENSION SET SCHEMA supported? */
 	bool		superuser;		/* must be superuser to install? */
 	bool		trusted;		/* allow becoming superuser on the fly? */
+	bool		dump_version;	/* record the VERSION during pg_dump? */
 	int			encoding;		/* encoding of the script file, or -1 */
 	List	   *requires;		/* names of prerequisite extensions */
 } ExtensionControlFile;
@@ -582,6 +583,14 @@ parse_extension_control_file(ExtensionControlFile *control,
 		 errmsg("parameter \"%s\" requires a Boolean value",
 item->name)));
 		}
+		else if (strcmp(item->name, "dump_version") == 0)
+		{
+			if (!parse_bool(item->value, >dump_version))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("parameter \"%s\" requires a Boolean value",
+item->name)));
+		}
 		else if (strcmp(item->name, "encoding") == 0)
 		{
 			control->encoding = pg_valid_server_encoding(item->value);
@@ -639,6 +648,7 @@ read_extension_control_file(const char *extname)
 	control->relocatable = false;
 	control->superuser = true;
 	control->trusted = false;
+	control->dump_version = false;
 	control->encoding = -1;
 
 	/*
@@ -2532,6 +2542,25 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * This function reports whether an extension's version should be written
+ * explicitly by pg_dump.
+ */
+Datum
+pg_extension_version_is_dumpable(PG_FUNCTION_ARGS)
+{
+	Name		extname = PG_GETARG_NAME(0);
+	ExtensionControlFile *control;
+
+	/* Check extension name validity before any filesystem access */
+	check_valid_extension_name(NameStr(*extname));
+
+	/* Read the extension's control file */
+	control = read_extension_control_file(NameStr(*extname));
+
+	PG_RETURN_BOOL(control->dump_version);
+}
+
 /*
  * extension_config_remove
  *
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 527c7651ab..177ffbede5 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5209,11 +5209,21 @@ getExtensions(Archive *fout, int *numExtensions)
 	int			i_extversion;
 	int			i_extconfig;
 	int			i_extcondition;
+	int			i_dump_version;
 
 	query = createPQExpBuffer();
 
 	appendPQExpBufferStr(query, "SELECT x.tableoid, x.oid, "
-		 "x.extname, n.nspname, x.extrelocatable, x.extversion, x.extconfig, x.extcondition "
+		 "x.extname, n.nspname, x.extrelocatable, 

Re: Can we let extensions change their dumped catalog schemas?

2023-01-17 Thread Jacob Champion
On 1/12/23 11:04, Jacob Champion wrote:
> On Wed, Jan 11, 2023 at 1:03 PM Tom Lane  wrote:
>> Jacob Champion  writes:
>>> Right, I think it would have to be opt-in. Say, a new control file
>>> option dump_version or some such.
>>
>> That would require all the installed extensions to cope with this
>> the same way, which does not seem like a great assumption.
> 
> How so? Most installed extensions would not opt into a version dump,
> I'd imagine.

As a concrete example, Timescale's extension control file could look
like this:

default_version = '2.x.y'
module_pathname = '$libdir/timescaledb-2.x.y'
...
dump_version = true

which would then cause pg_dump to issue a VERSION for its CREATE
EXTENSION line. Other extensions would remain with the default
(dump_version = false), so they'd be dumped without an explicit VERSION.

(And in the case of option 3, the name of the control file option
changes -- dump_internals, maybe? -- but it still doesn't affect other
installed extensions.)

--Jacob




Re: Can we let extensions change their dumped catalog schemas?

2023-01-12 Thread Jacob Champion
On Wed, Jan 11, 2023 at 1:03 PM Tom Lane  wrote:
> Jacob Champion  writes:
> > Right, I think it would have to be opt-in. Say, a new control file
> > option dump_version or some such.
>
> That would require all the installed extensions to cope with this
> the same way, which does not seem like a great assumption.

How so? Most installed extensions would not opt into a version dump,
I'd imagine.

Or do you mean that the version dump would apply retroactively to
older versions of the extension, even if it wasn't needed in the past?

--Jacob




Re: Can we let extensions change their dumped catalog schemas?

2023-01-11 Thread Tom Lane
Jacob Champion  writes:
> On Tue, Jan 10, 2023 at 7:53 PM Tom Lane  wrote:
>> I also fear that it will break
>> a bunch of use-cases that work fine today, which are exactly the
>> ones for which we originally defined pg_dump as *not* committing
>> to a particular extension version.

> Right, I think it would have to be opt-in. Say, a new control file
> option dump_version or some such.

That would require all the installed extensions to cope with this
the same way, which does not seem like a great assumption.

regards, tom lane




Re: Can we let extensions change their dumped catalog schemas?

2023-01-11 Thread Jacob Champion
On Tue, Jan 10, 2023 at 7:53 PM Tom Lane  wrote:
> Jacob Champion  writes:
> > Unless I'm missing something obvious (please, let it be that) there's no
> > way to do this safely. Once you've marked an internal table as dumpable,
> > its schema is effectively frozen if you want your dumps to work across
> > versions, because otherwise you'll try to restore that "catalog" data
> > into a table that has different columns. And while sometimes you can
> > make that work, it doesn't in the general case.
>
> I agree that's a problem, but it's not that we're arbitrarily prohibiting
> something that would work.  What, concretely, do you think could be
> done to improve the situation?

Concretely, I think extensions should be able to invoke their update
scripts at some point after a dump/restore cycle, whether
automatically or manually.

> > 2) Provide a way to record the exact version of an extension in a dump.
>
> Don't really see how that helps?

If pg_dump recorded our extension using

CREATE EXTENSION timescaledb VERSION 

then we'd be able to migrate the changed catalog post-restore, using a
standard ALTER EXTENSION ... UPDATE.

> I also fear that it will break
> a bunch of use-cases that work fine today, which are exactly the
> ones for which we originally defined pg_dump as *not* committing
> to a particular extension version.

Right, I think it would have to be opt-in. Say, a new control file
option dump_version or some such.

> It feels like what we really need here is some way to mutate the
> old format of an extension config table into the new format.

Agreed. We already provide mutation functions via the update scripts,
so I think both proposal 2 and 3 do that. I'm curious about your
opinion on option 3, since it would naively seem to make pg_dump do
_less_ work for a given extension.

> Simple addition of new columns shouldn't be a problem (in fact,
> I think that works already, or could easily be made to).  If you
> want some ETL processing then it's harder :-(.

One sharp edge for the add-a-new-column case is where you give the new
column a default, and you want all of the old migrated rows to have
some non-default value to handle backwards compatibility. (But that
case is handled trivially if you _know_ that you're performing a
migration.)

> Could an ON INSERT
> trigger on an old config table transpose converted data into a
> newer config table?

You mean something like, introduce table catalog_v2, and have all
INSERTs to catalog_v1 migrate and redirect the rows? That seems like
it could work today, though it would mean maintaining two different
upgrade paths for the same table, migrating all users of the catalog
to the new name, and needing to drop the old table at... some point
after the restore? I don't know if there would be performance concerns
with larger catalogs (in fact I'm not sure how big these catalogs
get).

> Another point that ought to be made here is that pg_dump is not
> the only outside consumer of extension config data.  You're likely
> to break some applications if you change a config table too much.

Such as? We don't really want applications to be coupled against our
internals by accident, but we have to dump the internals to be able to
reproduce the state of the system.

> That's not an argument that we shouldn't try to make pg_dump more
> forgiving, but I'm not sure that we need to move heaven and earth.

Agreed. Hopefully we can find something that just moves a little earth. :D

Thanks!
--Jacob




Re: Can we let extensions change their dumped catalog schemas?

2023-01-10 Thread Tom Lane
Jacob Champion  writes:
> We'd like to be allowed to change the schema for a table that's been
> marked in the past with pg_extension_config_dump().

> Unless I'm missing something obvious (please, let it be that) there's no
> way to do this safely. Once you've marked an internal table as dumpable,
> its schema is effectively frozen if you want your dumps to work across
> versions, because otherwise you'll try to restore that "catalog" data
> into a table that has different columns. And while sometimes you can
> make that work, it doesn't in the general case.

I agree that's a problem, but it's not that we're arbitrarily prohibiting
something that would work.  What, concretely, do you think could be
done to improve the situation?

> 2) Provide a way to record the exact version of an extension in a dump.

Don't really see how that helps?  I also fear that it will break
a bunch of use-cases that work fine today, which are exactly the
ones for which we originally defined pg_dump as *not* committing
to a particular extension version.

It feels like what we really need here is some way to mutate the
old format of an extension config table into the new format.
Simple addition of new columns shouldn't be a problem (in fact,
I think that works already, or could easily be made to).  If you
want some ETL processing then it's harder :-(.  Could an ON INSERT
trigger on an old config table transpose converted data into a
newer config table?

Another point that ought to be made here is that pg_dump is not
the only outside consumer of extension config data.  You're likely
to break some applications if you change a config table too much.
That's not an argument that we shouldn't try to make pg_dump more
forgiving, but I'm not sure that we need to move heaven and earth.

regards, tom lane