Re: [HACKERS] 8.3 / 8.2.6 restore comparison

Tom Lane Sun, 24 Feb 2008 09:33:54 -0800

"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> At some point, I think we have to bite the bullet and find a way to use 
> multiple CPUs for a single load. I don't have any good ideas or plans 
> for that, but hopefully someone does.


As already mentioned upthread, we could do that today, with zero backend
changes, by making pg_restore drive multiple sessions.  Now there are
scenarios where this wouldn't help too much --- eg, a database with only
one enormous table.  We couldn't parallelize the loading of that table,
although we could parallelize creation of its indexes.  But for an
example such as JD is complaining about, we ought to have no trouble
thrashing his disks into the ground ;-)

What you would need for this is an -Fc or -Ft dump, because a plain
script output from pg_dump doesn't carry any dependency information,
much less any index of where in the file different bits of data are.
Just armwaving, I envision a multiprocess pg_restore like this:

        * one controller process that tracks the not-yet-loaded
          TOC items in the dump
        * N child processes that just transmit a selected TOC item
          to a connected backend, and then ask the controller
          what to do next

The controller would need to use the dependency information to avoid,
eg, handing out a CREATE INDEX command before the parent table was
created and loaded.

One issue is that this couldn't use "-1" single-transaction restoring,
since obviously each child would need its own transaction, and
furthermore would have to commit before going back to the controller
for more work (since dependent TOC items might well get loaded by
a different child later).  That defeats a couple of optimizations that
Simon put in recently.  The one for no XLOG during COPY is not too
hard to see how to re-enable, but I'm not sure what else there was.

Most likely, the bottleneck with this sort of thing would be multiple
parallel reads from the pg_dump archive file.  Possibly the controller
process could be taught to schedule COPY and CREATE INDEX operations
so that not too many processes are trying to read lots of archive
data at the same time.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [HACKERS] 8.3 / 8.2.6 restore comparison

Reply via email to