On Thu, Feb 9, 2023 at 10:51 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > I'm fairly concerned about the idea of making it common for people > to write their own main loop for the archiver. That means that, if > we have a bug fix that requires the archiver to do X, we will not > just be patching our own code but trying to get an indeterminate > set of third parties to add the fix to their code.
I don't know what kind of bug we could really have in the main loop that would be common to every implementation. They're probably all going to check for interrupts, do some work, and then wait for I/O on some things by calling select() or some equivalent. But the work, and the wait for the I/O, would be different for every implementation. I would anticipate that the amount of common code would be nearly zero. Imagine two archive modules, one of which archives files via HTTP and the other of which archives them via SSH. They need to do a lot of the same things, but the code is going to be totally different. When the HTTP archiver module needs to open a new connection, it's going to call some libcurl function. When the SSH archiver module needs to do the same thing, it's going to call some libssh function. It seems quite likely that the HTTP implementation would want to juggle multiple connections in parallel, but the SSH implementation might not want to do that, or its logic for determining how many connections to open might be completely different based on the behavior of that protocol vs. the other protocol. Once either implementation has sent as much data it can over the connections it has open, it needs to wait for those sockets to become write-ready or, possibly, read-ready. There again, each one will be calling into a different library to do that. It could be that in this particular case, but would be waiting for a set of file descriptors, and we could provide some framework for waiting on a set of file descriptors provided by the module. But you could also have some other archiver implementation that is, say, waiting for a process to terminate rather than for a file descriptor to become ready for I/O. > If we think we need primitives to let the archiver hooks get all > the pending files, or whatever, by all means add those. But don't > cede fundamental control of the archiver. The hooks need to be > decoration on a framework we provide, not the framework themselves. I don't quite see how you can make asynchronous and parallel archiving work if the archiver process only calls into the archive module at times that it chooses. That would mean that the module has to return control to the archiver when it's in the middle of archiving one or more files -- and then I don't see how it can get control back at the appropriate time. Do you have a thought about that? -- Robert Haas EDB: http://www.enterprisedb.com