Greetings, * PG Doc comments form (nore...@postgresql.org) wrote: > Hello, > in > https://www.postgresql.org/docs/16/continuous-archiving.html#BACKUP-ARCHIVING-WAL > the example given could be improved for Linux / Unix environment. > > As cp command is buffered it means it could return success although the data > didn't reach the disk > which puts backups at risk.
Yup. > I propose to use dd command with its fsync option. > > So the actual equivalent example would be : > > archive_command = 'dd if=%p of=/mnt/server/archivedir/%f bs=1M > conv=fsync,excl status=none' # Unix > > What do you think ? This doesn't fsync the directory though, for one thing, and there are other considerations beyond that when having archive_command run and more generally when doing backups with PG. In short, the example in the documentation is not to ever be used but is intended to show how the replacement is done when the command is called, so that backup tool authors know how it works. In reality though, to write backup software for PG, you really do need to know PG in much more detail than the documentation provides, which means reading the source- for example, backup software should be checking the pg_control file's CRC as it's possible to read it just as it's being written and end up with an invalid pg_control file in the backup, making the backup invalid. There's been some discussion about how to improve this situation but nothing exists today from PG, so backup authors have to handle it. This is just one example, there are lots of others- unlogged table handling, temporary file handling, etc, etc. I'd strongly recommend using one of the existing well maintained backup tools which have been written specifically for PG for your backups. Writing a new backup tool for PG is a good bit of work and isn't really reasonable to do with shell scripts or simple unix commands. I do feel that we could improve the documentation around this by dropping comments like "using any convenient file-system-backup tool such as tar or cpio" as those don't, for example, support any way to reasonably deal with unlogged tables by themselves. Technically you could scan the data directory and provide an exclude file, or not include unlogged table files in the list of files to include, but then you're starting to get into things like how to tell if a file is associated with an unlogged table or not and while that's deep in the documentation, we don't make any mention or reference to unlogged tables in the backup documentation. Perhaps an addition to the low-level documentation under 'Backing Up The Data Directory' along these lines would be helpful: ######### You should omit any unlogged relation files (other than the 'init' fork) as they will be reset to be empty upon recovery and backing them up will simply increase your backup size (potentially significantly) and slow down the restore process. Unlogged tables have an init fork (link to storage-init.html) which is a file with the same filenode number as the relation but with a suffix added of '_init' (link to storage-file-layout.html). When an '_init' fork exists for a given relation, the '_init' file should be included in the backup, but all other files for that relation (the 'main' fork, which does not have a suffix, and all other forks which exist other than the 'init' fork) should be excluded from the backup. ######### There's also no way for tar or cpio to directly validate that the copy of pg_control that they copied is valid. Worse, on a restore, they'll restore pg_control more-or-less whenever and then if the restore doesn't complete for whatever reason, you might end up with a cluster that can be started, run for a while, but be missing whole tables. While it was only demonstrated relatively recently that the pg_control file can, in fact, be invalid when read during a backup, it's a real issue that's been around for, probably, forever, and there isn't really a good way to address it today. Still, perhaps we should include in the documentation, again under the 'Backup Up The Data Directory', something like: ######### You must be sure to check that the copy of pg_control which was copied is valid. The pg_controldata tool can be used for this purpose- after making the copy of the 'data_dir/global/pg_control' file which will be included in the backup, restore it into a new directory 'test_data_dir/global' and then run 'pg_controldata' on 'test_data_dir' and ensure that no error is returned. If an error is returned, attempt to re-copy the 'data_dir/global/pg_control' file and test again (during the backup), or fail the backup entirely. A backup without a valid pg_control file is not able to be restored. Further, on restore, it is strongly recommended to restore the pg_control file last and only after the successfully restoring the rest of the database, to prevent the database from being started either while the restore is happening or in the event that not all of the files which were part of the backup were able to be restored. ######### Of course, there's more, such as the lack of any discussion about making sure to have a separate manifest of all of the files that were copied as part of the backup to allow you to make sure that, for example, the tar file that was created for the backup didn't just get truncated somewhere along the way. The above alludes to this idea, at least. I will note that while we don't talk about all of these things in the documentation, pg_basebackup and pg_receivewal do handle things like doing an fsync() after the backup has completed or when receiving WAL, including of directories, skipping unlogged relations and temporary files, includes a manifest with a per-file checksum which the pg_verifybackup tool can be used to check, etc. While pg_basebackup doesn't yet handle the issue of an invalid read of pg_control, there's ongoing discussion about how to address that and patches have been proposed to deal with it. Hopefully something will be done soon there as pg_basebackup, particularly when run against a replica where the pg_control file is rewritten much more frequently, in certain cases, seems to be at particular risk. This isn't to say it's very likely- it's not, just that it is a thing that could happen and could lead to an invalid backup. Really, rewriting the entire "how to perform a low-level backup" and re-focusing that section of the documentation to be for backup tool authors rather than having it seem like it's a unix administrator or general DBA's documentation for performing backups would really go a long way towards improving the situation, for my ... slightly more than 2 cents (or your favorite currency), anyway. Thanks! Stephen
signature.asc
Description: PGP signature