Re: [HACKERS] Pg_upgrade speed for many tables

Jeff Janes Mon, 12 Nov 2012 10:30:06 -0800

On Mon, Nov 5, 2012 at 12:08 PM, Bruce Momjian <[email protected]> wrote:
> Magnus reported that a customer with a million tables was finding
> pg_upgrade slow.  I had never considered many table to be a problem, but
> decided to test it.  I created a database with 2k tables like this:
>
>         CREATE TABLE test1990 (x SERIAL);
>
> Running the git version of pg_upgrade on that took 203 seconds.  Using
> synchronous_commit=off dropped the time to 78 seconds.  This was tested
> on magnetic disks with a write-through cache.  (No change on an SSD with
> a super-capacitor.)
>
> I don't see anything unsafe about having pg_upgrade use
> synchronous_commit=off.  I could set it just for the pg_dump reload, but
> it seems safe to just use it always.  We don't write to the old cluster,
> and if pg_upgrade fails, you have to re-initdb the new cluster anyway.
>
> Patch attached.  I think it should be applied to 9.2 as well.


Is turning off synchronous_commit enough?  What about turning off fsync?

When I'm doing a pg_upgrade with thousands of tables, the shutdown
checkpoint after restoring the dump to the new cluster takes a very
long time, as the writer drains its operation table by opening and
individually fsync-ing thousands of files.  This takes about 40 ms per
file, which I assume is a combination of slow lap-top disk drive, and
a strange deal with ext4 which makes fsyncing a recently created file
very slow.   But even with faster hdd, this would still be a problem
if it works the same way, with every file needing 4 rotations to be
fsynced and this happens in serial.

Worse, the shutdown only waits for the default of 60 seconds for the
shutdown to take place before it throws an error and the entire
pg_upgrade gives up.  It seems to me that either the -t setting should
be increased, or should be an option to pg_upgrade.

My work around was to invoke a system-wide "sync" a couple seconds
after the 'pg_ctl stop' is initiated.  Flushing the files wholesale
seems to work to make the checkpoint writer rapidly find it has
nothing to do when it tries to flush them retail.

Anyway, the reason I think turning fsync off might be reasonable is
that as soon as the new cluster is shut down, pg_upgrade starts
overwriting most of those just-fsynced file with other files from the
old cluster, and AFAICT makes no effort to fsync them.  So until there
is a system-wide sync after the pg_upgrade finishes, your new cluster
is already in mortal danger anyway.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg_upgrade speed for many tables

Reply via email to