Here, I am attaching an updated patch. I fixed some bugs of v01 patch and did some code cleanup also.
TODO WIP 1: after excluding databases, we have paths of all the databases that are needed to restore so we can launch parallel workers for each database. I am studying for this part. TODO WIP 2: exclude-database=NAME, for pg_restore, I am using NAME as of now, I will try to make it PATTERN. PATTERN should be matched from map.dat file. Please have a look over the patch and let me know feedback. On Tue, 31 Dec 2024 at 23:53, Mahendra Singh Thalor <mahi6...@gmail.com> wrote: > Hi all, > With the help of Andrew and Dilip Kumar, I made a poc patch to dump all > the databases in archive format and then restore them using pg_restore. > > Brief about the patch: > new option to pg_dumpall: > -F, --format=d|p (directory|plain) output file format (directory, plain > text (default)) > > Ex: ./pg_dumpall --format=directory --file=dumpDirName > > dumps are as: > global.dat ::: global sql commands in simple plain format > map.dat. ::: dboid dbname ---entries for all databases in simple text > form > databases. ::: > subdir dboid1 -> toc.dat and data files in archive format > subdir dboid2. -> toc.dat and data files in archive format > etc > --------------------------------------------------------------------------- > > new options to pg_restore: > -g, --globals-only restore only global objects, no databases > --exclude-database=PATTERN exclude databases whose name matches PATTERN > > When we give -g/--globals-only option, then only restore globals, no db > restoring. > > *Design*: > When --format=directory is specified and there is no toc.dat file in the > main directory, then check > for global.dat and map.dat to restore all databases. If both files exist > in a directory, > then first restore all globals from global.dat and then restore all > databases one by one > from map.dat list. > While restoring, skip the databases that are given with exclude-database. > > --------------------------------------------------------------------------- > NOTE: > if needed, restore single db by particular subdir > > Ex: ./pg_restore --format=directory -d postgres dumpDirName/databases/5 > -- here, 5 is the dboid of postgres db > -- to get dboid, refer dbname in map.file > -------------------------------------------------------------------------- > > Please let me know feedback for the attached patch. > > On Tue, 11 Jun 2024 at 01:06, Magnus Hagander <mag...@hagander.net> wrote: > >> On Mon, Jun 10, 2024 at 6:21 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> >>> Magnus Hagander <mag...@hagander.net> writes: >>> > On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart < >>> nathandboss...@gmail.com> >>> > wrote: >>> >> Is there a particular advantage to that approach as opposed to just >>> using >>> >> "directory" mode for everything? >>> >>> > A gazillion files to deal with? Much easier to work with individual >>> custom >>> > files if you're moving databases around and things like that. >>> > Much easier to monitor eg sizes/dates if you're using it for backups. >>> >>> You can always tar up the directory tree after-the-fact if you want >>> one file. Sure, that step's not parallelized, but I think we'd need >>> some non-parallelized copying to create such a file anyway. >>> >> >> That would require double the disk space. >> >> But you can also just run pg_dump manually on each database and a >> pg_dumpall -g like people are doing today -- I thought this whole thing was >> about making it more convenient :) >> >> -- >> Magnus Hagander >> Me: https://www.hagander.net/ <http://www.hagander.net/> >> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/> >> > > > -- > Thanks and Regards > Mahendra Singh Thalor > EnterpriseDB: http://www.enterprisedb.com > -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
v02_poc_pg_dumpall_with_directory_2nd_jan.patch
Description: Binary data