On Thu, Oct 23, 2025 at 08:40:14AM -0400, Robert Haas wrote: > While I'm not against cross-checking against the control file, this > sounds like an imaginary scenario to me. That is, it would only happen > if somebody maliciously modified the contents of the data directory by > hand with the express goal of breaking the tool. But we fundamentally > cannot defend against a malicious user whose express goal is to break > the tool, and I do not see any compelling reason to expend energy on > it even in cases like this where we could theoretically detect it > without much effort. If we go down that path, we'll end up not only > complicating the code, but also obscuring our own goals: it will look > like we've either done too much sanity checking (because we will have > added checks that are unnecessary with a non-malicious user) or too > little (because we will not have caught all the things a malicious > user might do).
I was thinking about this argument over the weekend, and I am wondering if we could not do better here to detect if a file should be copied or not. What if we included a checksum of each file if both exist on the target and source, and just not copy them if the checksums match? You cannot do that for relation files when the source is online, of course, but for files like the oldest segments before the divergence point, that's better than checking the size, still more expensive due to the cost of the checksum computation. And there is a sha256() available at SQL level. Just throwing one idea in the bucket of ideas. That may not be worth the extra cost here, of course, but attaching a checksum to file_entry_t is not what I would qualify as an invasive change. -- Michael
signature.asc
Description: PGP signature
