Hi Robert,

On 8/30/23 10:49, Robert Haas wrote:
In the limited time that I've had to work on this project lately, I've
been trying to come up with a test case for this feature -- and since
I've gotten completely stuck, I thought it might be time to post and
see if anyone else has a better idea. I thought a reasonable test case
would be: Do a full backup. Change some stuff. Do an incremental
backup. Restore both backups and perform replay to the same LSN. Then
compare the files on disk. But I cannot make this work. The first
problem I ran into was that replay of the full backup does a
restartpoint, while the replay of the incremental backup does not.
That results in, for example, pg_subtrans having different contents.

pg_subtrans, at least, can be ignored since it is excluded from the backup and not required for recovery.

I'm not sure whether it can also result in data files having different
contents: are changes that we replayed following the last restartpoint
guaranteed to end up on disk when the server is shut down? It wasn't
clear to me that this is the case. I thought maybe I could get both
servers to perform a restartpoint at the same location by shutting
down the primary and then replaying through the shutdown checkpoint,
but that doesn't work because the primary doesn't finish archiving
before shutting down. After some more fiddling I settled (at least for
research purposes) on having the restored backups PITR and promote,
instead of PITR and pause, so that we're guaranteed a checkpoint. But
that just caused me to run into a far worse problem: replay on the
standby doesn't actually create a state that is byte-for-byte
identical to the one that exists on the primary. I quickly discovered
that in my test case, I was ending up with different contents in the
"hole" of a block wherein a tuple got updated. Replay doesn't think
it's important to make the hole end up with the same contents on all
machines that replay the WAL, so I end up with one server that has
more junk in there than the other one and the tests fail.

This is pretty much what I discovered when investigating backup from standby back in 2016. My (ultimately unsuccessful) efforts to find a clean delta resulted in [1] as I systematically excluded directories that are not required for recovery and will not be synced between a primary and standby.

FWIW Heikki also made similar attempts at this before me (back then I found the thread but I doubt I could find it again) and arrived at similar results. We discussed this in person and figured out that we had come to more or less the same conclusion. Welcome to the club!

Unless someone has a brilliant idea that I lack, this suggests to me
that this whole line of testing is a dead end. I can, of course, write
tests that compare clusters *logically* -- do the correct relations
exist, are they accessible, do they have the right contents? But I
feel like it would be easy to have bugs that escape detection in such
a test but would be detected by a physical comparison of the clusters.

Agreed, though a matching logical result is still very compelling.

However, such a comparison can only be conducted if either (a) there's
some way to set up the test so that byte-for-byte identical clusters
can be expected or (b) there's some way to perform the comparison that
can distinguish between expected, harmless differences and unexpected,
problematic differences. And at the moment my conclusion is that
neither (a) nor (b) exists. Does anyone think otherwise?

I do not. My conclusion back then was that validating a physical comparison would be nearly impossible without changes to Postgres to make the primary and standby match via replication. Which, by the way, I still think would be a great idea. In principle, at least. Replay is already a major bottleneck and anything that makes it slower will likely not be very popular.

This would also be great for WAL, since last time I tested the same WAL segment can be different between the primary and standby because the unused (and recycled) portion at the end is not zeroed as it is on the primary (but logically they do match). I would be very happy if somebody told me that my info is out of date here and this has been fixed. But when I looked at the code it was incredibly tricky to do this because of how WAL is replicated.

Meanwhile, here's a rebased set of patches. The somewhat-primitive
attempts at writing tests are in 0009, but they don't work, for the
reasons explained above. I think I'd probably like to go ahead and
commit 0001 and 0002 soon if there are no objections, since I think
those are good refactorings independently of the rest of this.

No objections to 0001/0002.

Regards,
-David

[1] http://git.postgresql.org/pg/commitdiff/6ad8ac6026287e3ccbc4d606b6ab6116ccc0eec8


Reply via email to