Le mardi 26 février 2008, Tom Dunstan a écrit : > On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > > Le mardi 26 février 2008, Dimitri Fontaine a écrit : > >> We could even support some option for the user to tell us which > >> disk arrays to use for parallel dumping. > >> > >> pg_dump -j2 --dumpto=/mount/sda:/mount/sdb ... > mydb.dump > >> pg_restore -j4 ... mydb.dump > > > > If its in a single file then it won't perform as well as if its separate > > files. We can put separate files on separate drives. We can begin > > reloading one table while another is still unloading. The OS will > > perform readahead for us on single files whereas on one file it will > > look like random I/O. etc. > > Yeah, writing multiple unknown-length streams to a single file in > parallel is going to be all kinds of painful [...] > While it's a bit fiddly, putting data on separate drives would then > involve something like symlinking the tablename inside the dump dir > off to an appropriate mount point, but that's probably not much worse > than running n different pg_dump commands specifying different files. > Heck, if you've got lots of data and want very particular behavior, > you've got to specify it somehow. :)
What I meant with the --dumpto=/mount/sda:/mount/sdb idea was that pg_dump would unload data to those dirs (filesystems/disk array/whatever) then prepare the final zip file from here. We could even choose for the --dumpto option to associate each entry to a process, or have a special TOC syntax which allows for complex setups, and have pg_dump dump first a TOC you edit, then use the edited version to control the parallel unloading, disks to use for which tables, etc. That is exactly your ideas, but with a try to make them appear clear and simple from a user point of view, so with some more work to get done by the tools. -- dim
signature.asc
Description: This is a digitally signed message part.