On Thu, Feb 1, 2024 at 3:06 AM Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Jan 25, 2024 at 2:52 AM Amul Sul <sula...@gmail.com> wrote: > > Thank you for the review-comments, updated version attached. > > I generally agree with 0001. I spent a long time thinking about your > decision to make verifier_context contain a pointer to manifest_data > instead of, as it does currently, a pointer to manifest_files_hash. I > don't think that's a horrible idea, but it also doesn't seem to be > used anywhere currently. One advantage of the current approach is that > we know that none of the code downstream of verify_backup_directory() > or verify_backup_checksums() actually cares about anything other than > the manifest_files_hash. That's kind of nice. If we didn't change this > as you have done here, then we would need to continue passing the WAL > ranges to parse_required_walI() and the system identifier would have > to be passed explicitly to the code that checks the system identifier, > but that's not such a bad thing, either. It makes it clear which > functions are using which information. > I intended to minimize the out param of parse_manifest_file(), which currently returns manifest_files_hash and manifest_wal_range, and I need two more -- manifest versions and the system identifier. But before you go change anything there, exactly when should 0002 be > checking the system identifier in the control file? What happens now > is that we first walk over the directory tree and make sure we have > the files (verify_backup_directory) and then go through and verify > checksums in a second pass (verify_backup_checksums). We do this > because it lets us report problems that can be detected cheaply -- > like missing files -- relatively quickly, and problems that are more > expensive to detect -- like mismatching checksums -- only after we've > reported all the cheap-to-detect problems. At what stage should we > verify the control file? I don't really like verifying it first, as > you've done, because I think the error message change in > 004_options.pl is a clear regression. When the whole directory is > missing, it's much more pleasant to complain about the directory being > missing than some file inside the directory being missing. > > What I'd be inclined to suggest is that you have verify_backup_file() > notice when the file it's being asked to verify is the control file, > and have it check the system identifier at that stage. I think if you > do that, then the error message change in 004_options.pl goes away. > Now, to do that, you'd need to have the whole manifest_data available > from the context, not just the manifest_files_hash, so that you can > see the expected system identifier. And, interestingly, if you take > this approach, then it appears to me that 0001 is correct as-is and > doesn't need any changes. > Yeah, we can do that, but I think it is a bit inefficient to have strcmp() check for the pg_control file on each verify_backup_file() call, despite, we know that path. Also, I think, we need additional handling to ensure that the system identifier has been verified in verify_backup_file(), what if the pg_control file itself missing from the backup -- might be a rare case, but possible. For now, we can do the system identifier validation after verify_backup_directory(). Regards, Amul