Here, I am attaching an updated patch. I fixed some bugs of v01 patch and
did some code cleanup also.

TODO WIP 1: after excluding databases, we have paths of all the databases
that are needed to
restore so we can launch parallel workers for each database. I am studying
for this part.

TODO WIP 2: exclude-database=NAME, for pg_restore, I am using NAME as of
now, I will try to make it PATTERN. PATTERN
should be matched from map.dat file.

Please have a look over the patch and let me know feedback.


On Tue, 31 Dec 2024 at 23:53, Mahendra Singh Thalor <mahi6...@gmail.com>
wrote:

> Hi all,
> With the help of Andrew and Dilip Kumar, I made a poc patch to dump all
> the databases in archive format and then restore them using pg_restore.
>
> Brief about the patch:
> new option to pg_dumpall:
> -F, --format=d|p (directory|plain) output file format (directory, plain
> text (default))
>
> Ex: ./pg_dumpall --format=directory --file=dumpDirName
>
> dumps are as:
> global.dat ::: global sql commands in simple plain format
> map.dat.   ::: dboid dbname ---entries for all databases in simple text
> form
> databases. :::
>       subdir     dboid1  -> toc.dat and data files in archive format
>       subdir     dboid2. -> toc.dat and data files in archive format
>               etc
> ---------------------------------------------------------------------------
>
> new options to pg_restore:
> -g, --globals-only           restore only global objects, no databases
> --exclude-database=PATTERN   exclude databases whose name matches PATTERN
>
> When we give -g/--globals-only option, then only restore globals, no db
> restoring.
>
> *Design*:
> When --format=directory is specified and there is no toc.dat file in the
> main directory, then check
> for global.dat and map.dat to restore all databases. If both files exist
> in a directory,
> then first restore all globals from global.dat and then restore all
> databases one by one
> from map.dat list.
> While restoring, skip the databases that are given with exclude-database.
>
> ---------------------------------------------------------------------------
> NOTE:
> if needed, restore single db by particular subdir
>
> Ex: ./pg_restore --format=directory -d postgres dumpDirName/databases/5
>    -- here, 5 is the dboid of postgres db
>    -- to get dboid, refer dbname in map.file
> --------------------------------------------------------------------------
>
> Please let me know feedback for the attached patch.
>
> On Tue, 11 Jun 2024 at 01:06, Magnus Hagander <mag...@hagander.net> wrote:
>
>> On Mon, Jun 10, 2024 at 6:21 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>>
>>> Magnus Hagander <mag...@hagander.net> writes:
>>> > On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <
>>> nathandboss...@gmail.com>
>>> > wrote:
>>> >> Is there a particular advantage to that approach as opposed to just
>>> using
>>> >> "directory" mode for everything?
>>>
>>> > A gazillion files to deal with? Much easier to work with individual
>>> custom
>>> > files if you're moving databases around and things like that.
>>> > Much easier to monitor eg sizes/dates if you're using it for backups.
>>>
>>> You can always tar up the directory tree after-the-fact if you want
>>> one file.  Sure, that step's not parallelized, but I think we'd need
>>> some non-parallelized copying to create such a file anyway.
>>>
>>
>> That would require double the disk space.
>>
>> But you can also just run pg_dump manually on each database and a
>> pg_dumpall -g like people are doing today -- I thought this whole thing was
>> about making it more convenient :)
>>
>> --
>>  Magnus Hagander
>>  Me: https://www.hagander.net/ <http://www.hagander.net/>
>>  Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
>>
>
>
> --
> Thanks and Regards
> Mahendra Singh Thalor
> EnterpriseDB: http://www.enterprisedb.com
>


-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment: v02_poc_pg_dumpall_with_directory_2nd_jan.patch
Description: Binary data

Reply via email to