Re: Fix pg_upgrade to preserve datdba

2021-03-22 Thread Jan Wieck

On 3/21/21 3:56 PM, Tom Lane wrote:

Jan Wieck  writes:
So let's focus on the actual problem of running out of XIDs and memory 
while doing the upgrade involving millions of small large objects.


Right.  So as far as --single-transaction vs. --create goes, that's
mostly a definitional problem.  As long as the contents of a DB are
restored in one transaction, it's not gonna matter if we eat one or
two more XIDs while creating the DB itself.  So we could either
relax pg_restore's complaint, or invent a different switch that's
named to acknowledge that it's not really only one transaction.

That still leaves us with the lots-o-locks problem.  However, once
we've crossed the Rubicon of "it's not really only one transaction",
you could imagine that the switch is "--fewer-transactions", and the
idea is for pg_restore to commit after every (say) 10 operations.
That would both bound its lock requirements and greatly cut its XID
consumption.


It leaves us with three things.

1) tremendous amounts of locks
2) tremendous amounts of memory needed
3) taking forever because it is single threaded.

I created a pathological case here on a VM with 24GB of RAM, 80GB of 
SWAP sitting on NVME. The database has 20 million large objects, each of 
which has 2 GRANTS, 1 COMMENT and 1 SECURITY LABEL (dummy). Each LO only 
contains a string "large object ", so the whole database in 9.5 is 
about 15GB in size.


A stock pg_upgrade to version 14devel using --link takes about 15 hours. 
This is partly because the pg_dump and pg_restore both grow to something 
like 50GB+ to hold the TOC. Which sounds out of touch considering that 
the entire system catalog on disk is less than 15GB. But aside from the 
ridiculous amount of swapping, the whole thing also suffers from 
consuming about 80 million transactions and apparently having just as 
many network round trips with a single client.




The work you described sounded like it could fit into that paradigm,
with the additional ability to run some parallel restore tasks
that are each consuming a bounded number of locks.


I have attached a POC patch that implements two new options for pg_upgrade.

  --restore-jobs=NUM --jobs parameter passed to pg_restore
  --restore-blob-batch-size=NUM  number of blobs restored in one xact

It does a bit more than just that. It rearranges the way large objects 
are dumped so that most of the commands are all in one TOC entry and the 
entry is emitted into SECTION_DATA when in binary upgrade mode (which 
guarantees that there isn't any actual BLOB data in the dump). This 
greatly reduces the number of network round trips and when using 8 
parallel restore jobs, almost saturates the 4-core VM. Reducing the 
number of TOC entries also reduces the total virtual memory need of 
pg_restore to 15G, so there is a lot less swapping going on.


It cuts down the pg_upgrade time from 15 hours to 1.5 hours. In that run 
I used --restore-jobs=8 and --restore-blob-batch-size=1 (with a 
max_locks_per_transaction=12000).


As said, this isn't a "one size fits all" solution. The pg_upgrade 
parameters for --jobs and --restore-jobs will really depend on the 
situation. Hundreds of small databases want --jobs, but one database 
with millions of large objects wants --restore-jobs.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c7351a4..4a611d0 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -864,6 +864,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
 	WaitForCommands(AH, pipefd);
 
 	/*
+	 * Close an eventually open BLOB batch transaction.
+	 */
+	CommitBlobTransaction((Archive *)AH);
+
+	/*
 	 * Disconnect from database and clean up.
 	 */
 	set_cancel_slot_archive(slot, NULL);
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0296b9b..cd8a590 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -203,6 +203,8 @@ typedef struct Archive
 	int			numWorkers;		/* number of parallel processes */
 	char	   *sync_snapshot_id;	/* sync snapshot id for parallel operation */
 
+	int			blobBatchSize;	/* # of blobs to restore per transaction */
+
 	/* info needed for string escaping */
 	int			encoding;		/* libpq code for client_encoding */
 	bool		std_strings;	/* standard_conforming_strings */
@@ -269,6 +271,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
 extern int	StartBlob(Archive *AH, Oid oid);
 extern int	EndBlob(Archive *AH, Oid oid);
 
+extern void	CommitBlobTransaction(Archive *AH);
 extern void CloseArchive(Archive *AH);
 
 extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1f82c64..51a862a 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -68,6 +68,8 @@ typedef struct _

Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
Jan Wieck  writes:
> So let's focus on the actual problem of running out of XIDs and memory 
> while doing the upgrade involving millions of small large objects.

Right.  So as far as --single-transaction vs. --create goes, that's
mostly a definitional problem.  As long as the contents of a DB are
restored in one transaction, it's not gonna matter if we eat one or
two more XIDs while creating the DB itself.  So we could either
relax pg_restore's complaint, or invent a different switch that's
named to acknowledge that it's not really only one transaction.

That still leaves us with the lots-o-locks problem.  However, once
we've crossed the Rubicon of "it's not really only one transaction",
you could imagine that the switch is "--fewer-transactions", and the
idea is for pg_restore to commit after every (say) 10 operations.
That would both bound its lock requirements and greatly cut its XID
consumption.

The work you described sounded like it could fit into that paradigm,
with the additional ability to run some parallel restore tasks
that are each consuming a bounded number of locks.

regards, tom lane




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Jan Wieck

On 3/21/21 2:34 PM, Tom Lane wrote:

and I see

--
-- Name: joe; Type: DATABASE; Schema: -; Owner: joe
--

CREATE DATABASE joe WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' LOCALE = 
'C';


ALTER DATABASE joe OWNER TO joe;

so at least in this case it's doing the right thing.  We need a bit
more detail about the context in which it's doing the wrong thing
for you.


After moving all of this to a pristine postgresql.org based repo I see 
the same. My best guess at this point is that the permission hoops, that 
RDS and Aurora PostgreSQL are jumping through, was messing with this. 
But that has nothing to do with the actual topic.


So let's focus on the actual problem of running out of XIDs and memory 
while doing the upgrade involving millions of small large objects.



Regards, Jan


--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
I wrote:
> ... so at least in this case it's doing the right thing.  We need a bit
> more detail about the context in which it's doing the wrong thing
> for you.

Just to cross-check, I tried modifying pg_upgrade's regression test
as attached, and it still passes.  (And inspection of the leftover
dump2.sql file verifies that the database ownership was correct.)
So I'm not sure what's up here.

regards, tom lane

diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index 9c6deae294..436646b5ba 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -150,6 +150,9 @@ export EXTRA_REGRESS_OPTS
 standard_initdb "$oldbindir"/initdb
 "$oldbindir"/pg_ctl start -l "$logdir/postmaster1.log" -o "$POSTMASTER_OPTS" -w
 
+# Create another user (just to exercise database ownership restoration).
+createuser regression_dbowner || createdb_status=$?
+
 # Create databases with names covering the ASCII bytes other than NUL, BEL,
 # LF, or CR.  BEL would ring the terminal bell in the course of this test, and
 # it is not otherwise a special case.  PostgreSQL doesn't support the rest.
@@ -160,7 +163,7 @@ dbname1='\"\'$dbname1'\\"\\\'
 dbname2=`awk 'BEGIN { for (i = 46; i <  91; i++) printf "%c", i }' 

Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
I wrote:
> Needs a little more work than that --- we should allow it to respond
> to the --no-owner switch, for example.  But I think likely we can do
> it where other object ownership is handled.  I'll look in a bit.

Actually ... said code already DOES do that, so now I'm confused.
I tried

regression=# create user joe;
CREATE ROLE
regression=# create database joe owner joe;
CREATE DATABASE
regression=# \q
$ pg_dump -Fc joe >joe.dump
$ pg_restore --create -f - joe.dump | more

and I see

--
-- Name: joe; Type: DATABASE; Schema: -; Owner: joe
--

CREATE DATABASE joe WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' LOCALE = 
'C';


ALTER DATABASE joe OWNER TO joe;

so at least in this case it's doing the right thing.  We need a bit
more detail about the context in which it's doing the wrong thing
for you.

regards, tom lane




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
Jan Wieck  writes:
>> On 3/21/21 12:57 PM, Tom Lane wrote:
>>> I think maybe what we have here is a bug in pg_restore, its
>>> --create switch ought to be trying to update the database's
>>> ownership.

> Thanks for that. I like this patch a lot better.

Needs a little more work than that --- we should allow it to respond
to the --no-owner switch, for example.  But I think likely we can do
it where other object ownership is handled.  I'll look in a bit.

regards, tom lane




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Jan Wieck

On 3/21/21 1:15 PM, Jan Wieck wrote:

On 3/21/21 12:57 PM, Tom Lane wrote:

Jan Wieck  writes:

On 3/20/21 12:39 AM, Jan Wieck wrote:

On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).



Patch attached.


Hmm, doesn't this lose all *other* database-level properties?

I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.


Possibly. I didn't look into that route.


Thanks for that. I like this patch a lot better.


Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..19c1e71 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3030,6 +3030,8 @@ dumpDatabase(Archive *fout)
 	resetPQExpBuffer(creaQry);
 	resetPQExpBuffer(delQry);
 
+	appendPQExpBuffer(creaQry, "ALTER DATABASE %s OWNER TO %s;\n", qdatname, dba);
+
 	if (strlen(datconnlimit) > 0 && strcmp(datconnlimit, "-1") != 0)
 		appendPQExpBuffer(creaQry, "ALTER DATABASE %s CONNECTION LIMIT = %s;\n",
 		  qdatname, datconnlimit);


Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Jan Wieck

On 3/21/21 12:57 PM, Tom Lane wrote:

Jan Wieck  writes:

On 3/20/21 12:39 AM, Jan Wieck wrote:

On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).



Patch attached.


Hmm, doesn't this lose all *other* database-level properties?

I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.


Possibly. I didn't look into that route.


Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Fix pg_upgrade to preserve datdba (was: Re: pg_upgrade failing for 200+ million Large Objects)

2021-03-21 Thread Tom Lane
Jan Wieck  writes:
> On 3/20/21 12:39 AM, Jan Wieck wrote:
>> On the way pg_upgrade also mangles the pg_database.datdba
>> (all databases are owned by postgres after an upgrade; will submit a
>> separate patch for that as I consider that a bug by itself).

> Patch attached.

Hmm, doesn't this lose all *other* database-level properties?

I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.

regards, tom lane