On 10/23/23 11:44, Robert Haas wrote:
On Fri, Oct 20, 2023 at 11:30 AM David Steele <da...@pgmasters.net> wrote:

I don't plan to stand in your way on this feature. I'm reviewing what
patches I can out of courtesy and to be sure that nothing adjacent to
your work is being affected. My apologies if my reviews are not meeting
your expectations, but I am contributing as my time constraints allow.

Sorry, I realize reading this response that I probably didn't do a
very good job writing that email and came across sounding like a jerk.
Possibly, I actually am a jerk. Whether it just sounded like it or I
actually am, I apologize.

That was the way it came across, though I prefer to think it was unintentional. I certainly understand how frustrating dealing with a large and uncertain patch can be. Either way, I appreciate the apology.

Now onward...

But your last paragraph here gets at my real
question, which is whether you were going to try to block the feature.
I recognize that we have different priorities when it comes to what
would make most sense to implement in PostgreSQL, and that's OK, or at
least, it's OK with me.

This seem perfectly natural to me.

I also don't have any particular expectation
about how much you should review the patch or in what level of detail,
and I have sincerely appreciated your feedback thus far. If you are
able to continue to provide more, that's great, and if that's not,
well, you're not obligated. What I was concerned about was whether
that review was a precursor to a vigorous attempt to keep the main
patch from getting committed, because if that was going to be the
case, then I'd like to surface that conflict sooner rather than later.
It sounds like that's not an issue, which is great.

Overall I would say I'm not strongly for or against the patch. I think it will be very difficult to use in a manual fashion, but automation is they way to go in general so that's not necessarily and argument against.

However, this is an area of great interest to me so I do want to at least make sure nothing is being impacted adjacent to the main goal of this patch. So far I have seen no sign of that, but that has been a primary goal of my reviews.

At the risk of drifting into the fraught question of what I *should*
be implementing rather than the hopefully-less-fraught question of
whether what I am actually implementing is any good, I see incremental
backup as a way of removing some of the use cases for the low-level
backup API. If you said "but people still will have lots of reasons to
use it," I would agree; and if you said "people can still screw things
up with pg_basebackup," I would also agree. Nonetheless, most of the
disasters I've personally seen have stemmed from the use of the
low-level API rather than from the use of pg_basebackup, though there
are exceptions.

This all makes sense to me.

I also think a lot of the use of the low-level API is
driven by it being just too darn slow to copy the whole database, and
incremental backup can help with that in some circumstances.

I would argue that restore performance is *more* important than backup performance and this patch is a step backward in that regard. Backups will be faster and less space will be used in the repository, but restore performance is going to suffer. If the deltas are very small the difference will probably be negligible, but as the deltas get large (and especially if there are a lot of them) the penalty will be more noticeable.

Also, I
have worked fairly hard to try to make sure that if you misuse
pg_combinebackup, or fail to use it altogether, you'll get an error
rather than silent data corruption. I would be interested to hear
about scenarios where the checks that I've implemented can be defeated
by something that is plausibly described as stupidity rather than
malice. I'm not sure we can fix all such cases, but I'm very alert to
the horror that will befall me if user error looks exactly like a bug
in the code. For my own sanity, we have to be able to distinguish
those cases.

I was concerned with the difficulty of trying to stage the correct backups for pg_combinebackup, not whether it would recognize that the needed data was not available and then error appropriately. The latter is surmountable within pg_combinebackup but the former is left up to the user.

Moreover, we also need to be able to distinguish
backup-time bugs from reassembly-time bugs, which is why I've got the
pg_walsummary tool, and why pg_combinebackup has the ability to emit
fairly detailed debugging output. I anticipate those things being
useful in investigating bug reports when they show up. I won't be too
surprised if it turns out that more work on sanity-checking and/or
debugging tools is needed, but I think your concern about people
misusing stuff is bang on target and I really want to do whatever we
can to avoid that when possible and detect it when it happens.

The ability of users to misuse tools is, of course, legendary, so that all sounds good to me.

One note regarding the patches. I feel like v5-0005-Prototype-patch-for-incremental-backup should be split to have the WAL summarizer as one patch and the changes to base backup as a separate patch.

It might not be useful to commit one without the other, but it would make for an easier read. Just my 2c.

Regards,
-David


Reply via email to