Re: [HACKERS] pg_dump additional options for performance

Tom Lane Tue, 26 Feb 2008 07:05:20 -0800

Magnus Hagander <[EMAIL PROTECTED]> writes:
> On Tue, Feb 26, 2008 at 12:39:29AM -0500, Tom Lane wrote:
>> BTW, what exactly was the use-case for this?


> One use-case would be when you have to make some small change to the schema
> while reloading it, that's still compatible with the data format. Then
> you'd dump schema-no-indexes-and-stuff, then *edit* that file, before
> reloading things. It's a lot easier to edit the file if it's not hundreds
> of gigabytes..

This is a use-case for having switches that *extract* convenient subsets
of a dump archive.  It does not mandate having pg_dump emit multiple
files.  You could extract, say, the pre-data schema into a text SQL
script, edit it, load it, then extract the data and remainining script
directly into the database from the dump file.

In short, what I think we need here is just some more conveniently
defined extraction filter switches than --schema-only and --data-only.
There's no need for any fundamental change to pg_dump's architecture.

Yes, I've read the subsequent discussion about a "directory" output
format.  I think it's pointless complication --- or at least, that it's
a performance hack rather than a functionality one, with no chance of
any actual performance gain until we've parallelized pg_restore, and
with zero existing evidence that any gain would be had even then.

BTW, if we avoid fooling with the definition of the archive format,
that also means that the extraction-switch patch should be relatively
independent of parallelization work, so the work could proceed
concurrently.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_dump additional options for performance

Reply via email to