On 4/12/24 12:15, Robert Haas wrote:
On Thu, Apr 11, 2024 at 5:48 PM David Steele <da...@pgmasters.net> wrote:
But they'll try because it is a new pg_basebackup feature and they'll
assume it is there to be used. Maybe it would be a good idea to make it
clear in the documentation that significant tooling will be required to
make it work.

I don't agree with that idea. LOTS of what we ship takes a significant
amount of effort to make it work. You may well need a connection
pooler. You may well need a failover manager which may or may not be
separate from your connection pooler. You need a backup tool. You need
a replication management tool which may or may not be separate from
your backup tool and may or may not be separate from your failover
tool. You probably need various out-of-core connections for the
programming languages you need. You may need a management tool, and
you probably need a monitoring tool. Some of the tools you might
choose to do all that stuff themselves have a whole bunch of complex
dependencies. It's a mess.

The difference here is you *can* use Postgres without a connection pooler (I have many times) or failover (if downtime is acceptable) but most people would agree that you really *need* backup.

The backup tool should be clear and easy to use or misery will inevitably result. pg_basebackup is difficult enough to use and automate because it has no notion of a repository, no expiration, and no WAL handling just to name a few things. Now there is an even more advanced feature that is even harder to use. So, no, I really don't think this feature is practically usable by the vast majority of end users.

Now, if someone were to say that we ought to talk about these issues
in our documentation and maybe give people some ideas about how to get
started, I would likely be in favor of that, modulo the small
political problem that various people would want their solution to be
the canonical one to which everyone gets referred. But I think it's
wrong to pretend like this feature is somehow special, that it's
somehow more raw or unfinished than tons of other things. I actually
think it's significantly *better* than a lot of other things. If we
add a disclaimer to the documentation saying "hey, this new
incremental backup feature is half-finished garbage!", and meanwhile
the documentation still says "hey, you can use cp as your
archive_command," then we have completely lost our minds.

Fair point on cp, but that just points to an overall lack in our documentation and built-in backup/recovery tools in general.

I also think that you're being more negative about this than the facts
justify. As I said to several colleagues today, I *fully* acknowledge
that you have a lot more practical experience in this area than I do,
and a bunch of good ideas. I was really pleased to see you talking
about how it would be good if these tools worked on tar files - and I
completely agree, and I hope that will happen, and I hope to help in
making that happen. I think there are a bunch of other problems too,
only some of which I can guess at. However, I think saying that this
feature is not realistically intended to be used by end-users or that
they will not be able to do so is over the top, and is actually kind
of insulting.

It is not meant to be insulting, but I still believe it to be true. After years of working with users on backup problems I think I have a pretty good bead on what the vast majority of admins are capable of and/or willing to do. Making this feature work is pretty high above that bar.

If the primary motivation is to provide a feature that can be integrated with third party tools, as Tomas suggests, then I guess usability is somewhat moot. But you are insisting that is not the case and I just don't see it that way.

There has been more enthusiasm for this feature on this
mailing list and elsewhere than I've gotten for anything I've
developed in years. And I don't think that's because all of the people
who have expressed enthusiasm are silly geese who don't understand how
terrible it is.

No doubt there is enthusiasm. It's a great feature to have. In particular I think the WAL summarizer is cool. But I do think the shortcomings are significant and that will become very apparent when people start to implement. The last minute effort to add COW support is an indication of problems that people will see in the field.

Further, I do think some less that ideal design decisions were made. In particular, I think sidelining manifests, i.e. making them optional, is not a good choice. This has led directly to the issue we see in [1]. If we require a manifest to make an incremental backup, why make it optional for combine?

This same design decision has led us to have "marker files" for zero-length files and unchanged files, which just seems extremely wasteful when these could be noted in the manifest. There are good reasons for writing everything out in a full backup, but for an incremental that can only be reconstructed using our tool the manifest should be sufficient.

Maybe all of this can be improved in a future release, along with tar reading, but none of those potential future improvements help me to believe that this is a user-friendly feature in this release.

Regards,
-David

---

[1] https://www.postgresql.org/message-id/flat/9badd24d-5bd9-4c35-ba85-4c38a2feb73e%40pgmasters.net


Reply via email to