On Fri, Sep 10, 2021 at 11:49 AM Julien Rouhaud <rjuju...@gmail.com> wrote: > I totally agree that batching as many file as possible in a single > command is probably what's gonna achieve the best performance. But if > the archiver only gets an answer from the archive_command once it > tried to process all of the file, it also means that postgres won't be > able to remove any WAL file until all of them could be processed. It > means that users will likely have to limit the batch size and > therefore pay more startup overhead than they would like. In case of > archiving on server with high latency / connection overhead it may be > better to be able to run multiple commands in parallel. I may be > overthinking here and definitely having feedback from people with more > experience around that would be welcome.
That's a fair point. I'm not sure how much it matters, though. I think you want to imagine a system where there are let's say 10 WAL flies being archived per second. Using fork() + exec() to spawn a shell command 10 times per second is a bit expensive, whether you do it serially or in parallel, and even if the command is something with a less-insane startup overhead than scp. If we start a shell command say every 3 seconds and give it 30 files each time, we can reduce the startup costs we're paying by ~97% at the price of having to wait up to 3 additional seconds to know that archiving succeeded for any particular file. That sounds like a pretty good trade-off, because the main benefit of removing old files is that it keeps us from running out of disk space, and you should not be running a busy system in such a way that it is ever within 3 seconds of running out of disk space, so whatever. If on the other hand you imagine a system that's not very busy, say 1 WAL file being archived every 10 seconds, then using a batch size of 30 would very significantly delay removal of old files. However, on this system, batching probably isn't really needed. The rate of WAL file generation is low enough that if you pay the startup cost of your archive_command for every file, you're probably still doing just fine. Probably, any kind of parallelism or batching needs to take this kind of time-based thinking into account. For batching, the rate at which files are generated should affect the batch size. For parallelism, it should affect the number of processes used. -- Robert Haas EDB: http://www.enterprisedb.com