Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Wed, Sep 16, 2020 at 3:06 PM Michael Paquier wrote: > > On Tue, Sep 15, 2020 at 10:46:56AM +0200, Julien Rouhaud wrote: > > FWIW I'm fine with those news comments! > > Okay, I got again on this one today and finished by committing the > patch as of 5423853. Thanks Michael!
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Tue, Sep 15, 2020 at 10:46:56AM +0200, Julien Rouhaud wrote: > FWIW I'm fine with those news comments! Okay, I got again on this one today and finished by committing the patch as of 5423853. -- Michael signature.asc Description: PGP signature
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Tue, Sep 15, 2020 at 4:48 AM Michael Paquier wrote: > > On Thu, Sep 10, 2020 at 02:31:32PM +0200, Daniel Gustafsson wrote: > > Given how unintrusive this optimization is, +1 from me to go ahead with this > > patch. pg_dump tests pass. Personally I would've updated the nearby > > comments > > to reflect why the check for dataOnly is there, but MMV there. I'm moving > > this > > patch to Ready for Committer. > > We need two comments here. I would suggest something like: > "Skip default/check for a data-only dump, as this is only needed for > dumps of the table schema." > > Better wording is of course welcome. FWIW I'm fine with those news comments!
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Mon, Sep 14, 2020 at 10:56:01PM -0400, Tom Lane wrote: > (Note that we disallow sub-queries in CHECK constraints, and also > disclaim responsibility for what happens if you cheat by hiding > the subquery in a function. So while it's certainly possible to > build CHECK constraints that only work if table X is loaded before > table Y, pg_dump already doesn't guarantee that'll work, --data-only > or otherwise.) Yep, exactly what I was thinking upthread by cheating with a schema having cross-table references in a check constraint. -- Michael signature.asc Description: PGP signature
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
Michael Paquier writes: > On Tue, Jul 14, 2020 at 11:14:50AM +0200, Julien Rouhaud wrote: >> Note that those extraneous queries were found while trying to dump >> data out of a corrupted database. The issue wasn't an excessive >> runtime but corrupted catalog entries, so bypassing this code (since I >> was only interested in the data anyway) was simpler than trying to >> recover yet other corrupted rows. > Yeah, I don't see actually why this argument can prevent us from doing > a micro optimization if it proves to work correctly. The main thing I'm wondering about is whether not fetching these objects could lead to failing to detect an important dependency chain. IIRC, pg_dump simply ignores pg_depend entries that mention objects it has not loaded, so there is a possible mechanism for that. However, it's hard to see how a --data-only dump could end up choosing an invalid dump order on that basis. It doesn't seem like safe load orders for the table data objects could depend on what is referenced by defaults or CHECK constraints. But ... I've only spent a few minutes thinking about it, so maybe I'm missing something. (Note that we disallow sub-queries in CHECK constraints, and also disclaim responsibility for what happens if you cheat by hiding the subquery in a function. So while it's certainly possible to build CHECK constraints that only work if table X is loaded before table Y, pg_dump already doesn't guarantee that'll work, --data-only or otherwise.) regards, tom lane
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Thu, Sep 10, 2020 at 02:31:32PM +0200, Daniel Gustafsson wrote: > Given how unintrusive this optimization is, +1 from me to go ahead with this > patch. pg_dump tests pass. Personally I would've updated the nearby comments > to reflect why the check for dataOnly is there, but MMV there. I'm moving > this > patch to Ready for Committer. We need two comments here. I would suggest something like: "Skip default/check for a data-only dump, as this is only needed for dumps of the table schema." Better wording is of course welcome. > I'm fairly sure there is a lot more we can do to improve the performance of > data-only dumps, but this nicely chips away at the problem. I was looking at that, and wondered about cases like the following, artistic, thing: CREATE FUNCTION check_data_zz() RETURNS boolean LANGUAGE sql STABLE STRICT AS $$select count(a) > 0 from zz$$; CREATE TABLE public.yy ( a integer, CONSTRAINT yy_check CHECK (check_data_zz()) ); CREATE TABLE zz (a integer); INSERT INTO zz VALUES (1); INSERT INTO yy VALUES (1); Even on HEAD, this causes the data load to fail because yy's data is inserted before zz, so keeping track of the CHECK dependency could have made sense for --data-only if we were to make a better work at detecting the dependency between both tables and made sure that zz data needs to appear before yy, but it is not like this would happen easily in pg_dump, and we document it this way (see the warning about dump/reload in ddl.sgml for check constraints). In short, I think that this patch looks like a good thing to have. -- Michael signature.asc Description: PGP signature
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Tue, Jul 14, 2020 at 11:14:50AM +0200, Julien Rouhaud wrote: > Note that those extraneous queries were found while trying to dump > data out of a corrupted database. The issue wasn't an excessive > runtime but corrupted catalog entries, so bypassing this code (since I > was only interested in the data anyway) was simpler than trying to > recover yet other corrupted rows. Yeah, I don't see actually why this argument can prevent us from doing a micro optimization if it proves to work correctly. -- Michael signature.asc Description: PGP signature
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
> On 12 Jul 2020, at 07:48, Julien Rouhaud wrote: > Currently, getTableAttrs() always retrieves info about columns defaults and > check constraints, while this will never be used if --data-only option if > used. > This seems like a waste of resources, so here's a patch to skip those parts > when the DDL won't be generated. Given how unintrusive this optimization is, +1 from me to go ahead with this patch. pg_dump tests pass. Personally I would've updated the nearby comments to reflect why the check for dataOnly is there, but MMV there. I'm moving this patch to Ready for Committer. I'm fairly sure there is a lot more we can do to improve the performance of data-only dumps, but this nicely chips away at the problem. cheers ./daniel
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Sun, Jul 12, 2020 at 4:29 PM Justin Pryzby wrote: > > On Sun, Jul 12, 2020 at 07:48:50AM +0200, Julien Rouhaud wrote: > > Currently, getTableAttrs() always retrieves info about columns defaults and > > check constraints, while this will never be used if --data-only option if > > used. > > This seems like a waste of resources, so here's a patch to skip those parts > > when the DDL won't be generated. > > Note that the speed of default and constraint handling has come up before: > https://www.postgresql.org/message-id/flat/CAMkU%3D1xPqHP%3D7YPeChq6n1v_qd4WGf%2BZvtnR-b%2BgyzFqtJqMMQ%40mail.gmail.com > https://www.postgresql.org/message-id/CAMkU=1x-e+maqefhm1ymesij8j9z+sjhgw7c9bqo3e3jmg4...@mail.gmail.com Oh, I wasn't aware of that. > I'd pointed out that a significant fraction of our pg_upgrade time was in > pg_dump, due to having wide tables with many child tables, and "default 0" on > every column. (I've since dropped our defaults so this is no longer an issue > here). > > It appears your patch would avoid doing unnecessary work in the --data-only > case, but it wouldn't help the pg_upgrade case. Indeed. Making the schema part faster would probably require a bigger refactoring. I'm wondering if we could introduce some facility to temporarily deny any DDL change, so that the initial pg_dump -s done by pg_upgrade can be performed before shutting down the instance. Note that those extraneous queries were found while trying to dump data out of a corrupted database. The issue wasn't an excessive runtime but corrupted catalog entries, so bypassing this code (since I was only interested in the data anyway) was simpler than trying to recover yet other corrupted rows.
Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
On Sun, Jul 12, 2020 at 07:48:50AM +0200, Julien Rouhaud wrote: > Currently, getTableAttrs() always retrieves info about columns defaults and > check constraints, while this will never be used if --data-only option if > used. > This seems like a waste of resources, so here's a patch to skip those parts > when the DDL won't be generated. Note that the speed of default and constraint handling has come up before: https://www.postgresql.org/message-id/flat/CAMkU%3D1xPqHP%3D7YPeChq6n1v_qd4WGf%2BZvtnR-b%2BgyzFqtJqMMQ%40mail.gmail.com https://www.postgresql.org/message-id/CAMkU=1x-e+maqefhm1ymesij8j9z+sjhgw7c9bqo3e3jmg4...@mail.gmail.com I'd pointed out that a significant fraction of our pg_upgrade time was in pg_dump, due to having wide tables with many child tables, and "default 0" on every column. (I've since dropped our defaults so this is no longer an issue here). It appears your patch would avoid doing unnecessary work in the --data-only case, but it wouldn't help the pg_upgrade case. -- Justin
Avoid useless retrieval of defaults and check constraints in pg_dump -a
Hi, Currently, getTableAttrs() always retrieves info about columns defaults and check constraints, while this will never be used if --data-only option if used. This seems like a waste of resources, so here's a patch to skip those parts when the DDL won't be generated. diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index e758b5c50a..e4ea604dfe 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -8590,7 +8590,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables) /* * Get info about column defaults */ - if (hasdefaults) + if (!dopt->dataOnly && hasdefaults) { AttrDefInfo *attrdefs; int numDefaults; @@ -8677,7 +8677,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables) /* * Get info about table CHECK constraints */ - if (tbinfo->ncheck > 0) + if (!dopt->dataOnly && tbinfo->ncheck > 0) { ConstraintInfo *constrs; int numConstrs;