Re: parallelizing the archiver

Jacob Champion Fri, 10 Sep 2021 10:07:25 -0700

On Fri, 2021-09-10 at 23:48 +0800, Julien Rouhaud wrote:
> I totally agree that batching as many file as possible in a single
> command is probably what's gonna achieve the best performance.  But if
> the archiver only gets an answer from the archive_command once it
> tried to process all of the file, it also means that postgres won't be
> able to remove any WAL file until all of them could be processed.  It
> means that users will likely have to limit the batch size and
> therefore pay more startup overhead than they would like.  In case of
> archiving on server with high latency / connection overhead it may be
> better to be able to run multiple commands in parallel.


Well, users would also have to limit the parallelism, right? If
connections are high-overhead, I wouldn't imagine that running hundreds
of them simultaneously would work very well in practice. (The proof
would be in an actual benchmark, obviously, but usually I would rather
have one process handling a hundred items than a hundred processes
handling one item each.)

For a batching scheme, would it be that big a deal to wait for all of
them to be archived before removal?

> > That is possibly true. I think it might work to just assume that you
> > have to retry everything if it exits non-zero, but that requires the
> > archive command to be smart enough to do something sensible if an
> > identical file is already present in the archive.
> 
> Yes, it could be.  I think that we need more feedback for that too.

Seems like this is the sticking point. What would be the smartest thing
for the command to do? If there's a destination file already, checksum
it and make sure it matches the source before continuing?

--Jacob

Re: parallelizing the archiver

Reply via email to