On 04/24/2015 06:41 PM, Tom Lane wrote:
Andrew Dunstan <and...@dunslane.net> writes:
On 04/23/2015 04:04 PM, Andrew Gierth wrote:
The relevant code is getBlobs in pg_dump.c, which queries the whole of
pg_largeobject_metadata without using a cursor (so the PGresult is
already huge thanks to having >100 million rows), and then mallocs a
BlobInfo array and populates it from the PGresult, also using pg_strdup
for the oid string, owner name, and ACL if any.
I'm surprised this hasn't come up before. I have a client that I
persuaded to convert all their LOs to bytea fields because of problems
with pg_dump handling millions of LOs, and kept them on an older
postgres version until they made that change.
Yeah, this was brought up when we added per-large-object metadata; it was
obvious that that patch would cause pg_dump to choke on large numbers of
large objects.  The (perhaps rather lame) argument was that you wouldn't
have that many of them.

Given that large objects don't have any individual dependencies,
one could envision fixing this by replacing the individual large-object
DumpableObjects by a single placeholder to participate in the sort phase,
and then when it's time to dump that, scan the large objects using a
cursor and create/print/delete the information separately for each one.
This would likely involve some rather painful refactoring in pg_dump
however.


I think we need to think about this some more, TBH, I'm not convinced that the changes made back in 9.0 were well conceived. Having separate TOC entries for each LO seems wrong in principle, although I understand why it was done. For now, my advice would be to avoid use of pg_dump/pg_restore if you have large numbers of LOs. The good news is that these days there are alternative methods of doing backup / restore, albeit not 100% equivalent with pg_dump / pg_restore.

One useful thing might be to provide pg_dump with --no-blobs/--blobs-only switches so you could at least easily segregate the blobs into their own dump file. That would be in addition to dealing with the memory problems pg_dump has with millions of LOs, of course.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to