On 4/25/24 00:05, Robert Haas wrote:
On Tue, Apr 23, 2024 at 7:23 PM David Steele <da...@pgmasters.net> wrote:
I don't understand what you mean here. I thought we were in agreement
that verifying contents would cost a lot more. The verification that
we can actually do without much cost can only check for missing files
in the most recent backup, which is quite weak. pg_verifybackup is
available if you want more comprehensive verification and you're
willing to pay the cost of it.

I simply meant that it is *possible* to verify the output of
pg_combinebackup without explicitly verifying all the backups. There
would be overhead, yes, but it would be less than verifying each backup
individually. For my 2c that efficiency would make it worth doing
verification in pg_combinebackup, with perhaps a switch to turn it off
if the user is confident in their sources.

Hmm, can you outline the algorithm that you have in mind? I feel we've
misunderstood each other a time or two already on this topic, and I'd
like to avoid more of that. Unless you just mean what the patch I
posted does (check if anything from the final manifest is missing from
the corresponding directory), but that doesn't seem like verifying the
output.

Yeah, it seems you are right that it is not possible to verify the output in all cases.

However, I think allowing the user to optionally validate the input would be a good feature. Running pg_verifybackup as a separate step is going to be a more expensive then verifying/copying at the same time. Even with storage tricks to copy ranges of data, pg_combinebackup is going to aware of files that do not need to be verified for the current operation, e.g. old copies of free space maps.

Additionally, if pg_combinebackup is updated to work against tar.gz, which I believe will be important going forward, then there would be little penalty to verification since all the required data would be in memory at some point anyway. Though, if the file is compressed it might be redundant since compression formats generally include checksums.

One more thing occurs to me -- if data checksums are enabled then a rough and ready output verification would be to test the checksums during combine. Data checksums aren't very good but something should be triggered if a bunch of pages go wrong, especially since the block offset is part of the checksum. This would be helpful for catching combine bugs.

Regards,
-David


Reply via email to