Re: pg_upgrade failing for 200+ million Large Objects

Jan Wieck Sat, 20 Mar 2021 09:55:52 -0700

On 3/20/21 11:23 AM, Tom Lane wrote:

Jan Wieck <[email protected]> writes:

All that aside, the entire approach doesn't scale.


Yeah, agreed.  When we gave large objects individual ownership and ACL
info, it was argued that pg_dump could afford to treat each one as a
separate TOC entry because "you wouldn't have that many of them, if
they're large".  The limits of that approach were obvious even at the
time, and I think now we're starting to see people for whom it really
doesn't work.

It actually looks more like some users have millions of "small objects".I am still wondering where that is coming from and why they are abusingLOs in that way, but that is more out of curiosity. Fact is that theyare out there and that they cannot upgrade from their 9.5 databases,which are now past EOL.


I wonder if pg_dump could improve matters cheaply by aggregating the
large objects by owner and ACL contents.  That is, do

select distinct lomowner, lomacl from pg_largeobject_metadata;

and make just *one* BLOB TOC entry for each result.  Then dump out
all the matching blobs under that heading.

What I am currently experimenting with is moving the BLOB TOC entriesinto the parallel data phase of pg_restore "when doing binary upgrade".It seems to scale nicely with the number of cores in the system. Inaddition to that have options for pg_upgrade and pg_restore that causethe restore to batch them into transactions, like 10,000 objects at atime. There was a separate thread for that but I guess it is better tokeep it all together here now.


A possible objection is that it'd reduce the ability to restore blobs
selectively, so maybe we'd need to make it optional.

I fully intend to make all this into new "options". I am afraid thatthere is no one-size-fits-all solution here.


Of course, that just reduces the memory consumption on the client
side; it does nothing for the locks.  Can we get away with releasing the
lock immediately after doing an ALTER OWNER or GRANT/REVOKE on a blob?

I'm not very fond of the idea going lockless when at the same timetrying to parallelize the restore phase. That can lead to really nastyrace conditions. For now I'm aiming at batches in transactions.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services

Re: pg_upgrade failing for 200+ million Large Objects

Reply via email to