On Tue, 2008-02-26 at 12:27 -0500, Greg Smith wrote: > On Tue, 26 Feb 2008, Simon Riggs wrote: > > > Splitting up the dump is the enabler for splitting up the load. > > While the pg_dump split train seems to be leaving the station, I feel > compelled to point out that focus does nothing to help people who are > bulk-loading data that came from somewhere else. If my data is already in > PostgreSQL, and I'm doing a dump/load, I can usually split the data easily > enough with existing tools to handle that right now via COPY (SELECT...) > TO. Some tools within pg_dump would be nice, but I don't need them that > much. It's gigantic files that came from some other DB I don't even have > access to that I struggle with loading efficiently.
Don't write them as gigantic files... :-) > The work Dimitri is doing is wandering in that direction and that may be > enough. I note that something that addresses loading big files regardless > of source could also work on PostgreSQL dumps, while a pg_dump focused > effort helps nothing but that specific workflow. I wonder if doing too > much work on the pg_dump path is the best use of someone's time when the > more general case will need to be addressed one day anyway. I take your points. I've spent as much time discussing this as coding it, which is OK cos coding the wrong thing is also a waste of time. If we have individual load files then we will be able to use a parallel load utility on them, whether they come from pg_dump or otherwise. Neither of the current pg_dump output formats would allow that. --data-only does allow that but there's no way to dump a sensible schema to go with that, which is what this thread is about. Dimitri is working on a parallel loader that allows error handling. I don't think its worth duplicating that effort by attempting to make COPY work in parallel. Even if we did, the single client could easily become a bottleneck. So an external utility that spawns multiple processes and runs multiple concurrent COPY statements sounds quite good to me. He's asked for help, if anybody can assist. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org