Re: [HACKERS] directory archive format for pg_dump

Heikki Linnakangas Thu, 16 Dec 2010 10:46:15 -0800

On 16.12.2010 20:33, Joachim Wieland wrote:

On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
<[email protected]>  wrote:

As soon as we have parallel pg_dump, the next big thing is going to be
parallel dump of the same table using multiple processes. Perhaps we should
prepare for that in the directory archive format, by allowing the data of a
single table to be split into multiple files. That way parallel pg_dump is
simple, you just split the table in chunks of roughly the same size, say
10GB each, and launch a process for each chunk, writing to a separate file.


How exactly would you "just split the table in chunks of roughly the
same size" ?

Check pg_class.relpages, and divide that evenly across the processes.That should be good enough.

Which queries should pg_dump send to the backend? If it
just sends a bunch of WHERE queries, the server would still scan the
same data several times since each pg_dump client would result in a
seqscan over the full table.

Hmm, I was thinking of "SELECT * FROM table WHERE ctid BETWEEN ? AND ?",but we don't support TidScans for ranges. Perhaps we could add that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] directory archive format for pg_dump

Reply via email to