Re: Making pg_rewind faster

Michael Paquier Mon, 27 Oct 2025 21:02:49 -0700

On Thu, Oct 23, 2025 at 08:40:14AM -0400, Robert Haas wrote:
> While I'm not against cross-checking against the control file, this
> sounds like an imaginary scenario to me. That is, it would only happen
> if somebody maliciously modified the contents of the data directory by
> hand with the express goal of breaking the tool. But we fundamentally
> cannot defend against a malicious user whose express goal is to break
> the tool, and I do not see any compelling reason to expend energy on
> it even in cases like this where we could theoretically detect it
> without much effort. If we go down that path, we'll end up not only
> complicating the code, but also obscuring our own goals: it will look
> like we've either done too much sanity checking (because we will have
> added checks that are unnecessary with a non-malicious user) or too
> little (because we will not have caught all the things a malicious
> user might do).


I was thinking about this argument over the weekend, and I am
wondering if we could not do better here to detect if a file should be
copied or not.  What if we included a checksum of each file if both
exist on the target and source, and just not copy them if the
checksums match?  You cannot do that for relation files when the
source is online, of course, but for files like the oldest segments
before the divergence point, that's better than checking the size,
still more expensive due to the cost of the checksum computation.
And there is a sha256() available at SQL level.

Just throwing one idea in the bucket of ideas.  That may not be worth
the extra cost here, of course, but attaching a checksum to
file_entry_t is not what I would qualify as an invasive change.
--
Michael

signature.asc
Description: PGP signature

Re: Making pg_rewind faster

Reply via email to