Re: Add a different archive_command example for Linux / Unix

Stephen Frost Thu, 08 Feb 2024 04:00:29 -0800

Greetings,

* PG Doc comments form ([email protected]) wrote:
> Hello,
> in
> https://www.postgresql.org/docs/16/continuous-archiving.html#BACKUP-ARCHIVING-WAL
> the example given could be improved for Linux / Unix environment.
> 
> As cp command is buffered it means it could return success although the data
> didn't reach the disk
> which puts backups at risk.


Yup.

> I propose to use dd command with its fsync option.
> 
> So the actual equivalent example would be :
> 
> archive_command = 'dd if=%p of=/mnt/server/archivedir/%f bs=1M
> conv=fsync,excl status=none' # Unix
> 
> What do you think ?

This doesn't fsync the directory though, for one thing, and there are
other considerations beyond that when having archive_command run and
more generally when doing backups with PG.  In short, the example in the
documentation is not to ever be used but is intended to show how the
replacement is done when the command is called, so that backup tool
authors know how it works.

In reality though, to write backup software for PG, you really do need
to know PG in much more detail than the documentation provides, which
means reading the source- for example, backup software should be
checking the pg_control file's CRC as it's possible to read it just as
it's being written and end up with an invalid pg_control file in the
backup, making the backup invalid.  There's been some discussion about
how to improve this situation but nothing exists today from PG, so
backup authors have to handle it.  This is just one example, there are
lots of others- unlogged table handling, temporary file handling, etc,
etc.

I'd strongly recommend using one of the existing well maintained backup
tools which have been written specifically for PG for your backups.
Writing a new backup tool for PG is a good bit of work and isn't
really reasonable to do with shell scripts or simple unix commands.

I do feel that we could improve the documentation around this by
dropping comments like "using any convenient file-system-backup tool
such as tar or cpio" as those don't, for example, support any way to
reasonably deal with unlogged tables by themselves.  Technically you
could scan the data directory and provide an exclude file, or not
include unlogged table files in the list of files to include, but then
you're starting to get into things like how to tell if a file is
associated with an unlogged table or not and while that's deep in the
documentation, we don't make any mention or reference to unlogged tables
in the backup documentation.  Perhaps an addition to the low-level
documentation under 'Backing Up The Data Directory' along these lines
would be helpful:

#########
You should omit any unlogged relation files (other than the 'init' fork)
as they will be reset to be empty upon recovery and backing them up will
simply increase your backup size (potentially significantly) and slow
down the restore process.  Unlogged tables have an init fork (link to
storage-init.html) which is a file with the same filenode number as the
relation but with a suffix added of '_init' (link to
storage-file-layout.html).  When an '_init' fork exists for a given
relation, the '_init' file should be included in the backup, but all
other files for that relation (the 'main' fork, which does not have a
suffix, and all other forks which exist other than the 'init' fork)
should be excluded from the backup.
#########

There's also no way for tar or cpio to directly validate that the copy
of pg_control that they copied is valid.  Worse, on a restore, they'll
restore pg_control more-or-less whenever and then if the restore doesn't
complete for whatever reason, you might end up with a cluster that can
be started, run for a while, but be missing whole tables.  While it was
only demonstrated relatively recently that the pg_control file can, in
fact, be invalid when read during a backup, it's a real issue that's
been around for, probably, forever, and there isn't really a good way to
address it today.  Still, perhaps we should include in the
documentation, again under the 'Backup Up The Data Directory', something
like:

#########
You must be sure to check that the copy of pg_control which was
copied is valid.  The pg_controldata tool can be used for this purpose-
after making the copy of the 'data_dir/global/pg_control' file which
will be included in the backup, restore it into a new directory
'test_data_dir/global' and then run 'pg_controldata' on 'test_data_dir'
and ensure that no error is returned.  If an error is returned, attempt
to re-copy the 'data_dir/global/pg_control' file and test again (during
the backup), or fail the backup entirely.  A backup without a valid
pg_control file is not able to be restored.

Further, on restore, it is strongly recommended to restore the
pg_control file last and only after the successfully restoring the rest
of the database, to prevent the database from being started either while
the restore is happening or in the event that not all of the files which
were part of the backup were able to be restored.
#########

Of course, there's more, such as the lack of any discussion about making
sure to have a separate manifest of all of the files that were copied as
part of the backup to allow you to make sure that, for example, the tar
file that was created for the backup didn't just get truncated somewhere
along the way.  The above alludes to this idea, at least.

I will note that while we don't talk about all of these things in the
documentation, pg_basebackup and pg_receivewal do handle things like
doing an fsync() after the backup has completed or when receiving WAL,
including of directories, skipping unlogged relations and temporary
files, includes a manifest with a per-file checksum which the
pg_verifybackup tool can be used to check, etc.  While pg_basebackup
doesn't yet handle the issue of an invalid read of pg_control, there's
ongoing discussion about how to address that and patches have been
proposed to deal with it.  Hopefully something will be done soon there
as pg_basebackup, particularly when run against a replica where the
pg_control file is rewritten much more frequently, in certain cases,
seems to be at particular risk.  This isn't to say it's very likely-
it's not, just that it is a thing that could happen and could lead to an
invalid backup.

Really, rewriting the entire "how to perform a low-level backup" and
re-focusing that section of the documentation to be for backup tool
authors rather than having it seem like it's a unix administrator or
general DBA's documentation for performing backups would really go a
long way towards improving the situation, for my ... slightly more than
2 cents (or your favorite currency), anyway.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Add a different archive_command example for Linux / Unix

Reply via email to