On 8/28/18 8:32 AM, Stephen Frost wrote: > > * hubert depesz lubaczewski (dep...@depesz.com) wrote: >> I'm in a situation where we quite often generate more WAL than we can >> archive. The thing is - archiving takes long(ish) time but it's >> multi-step process and includes talking to remote servers over network. >> >> I tested that simply by running archiving in parallel I can easily get >> 2-3 times higher throughput. >> >> But - I'd prefer to keep postgresql knowing what is archived, and what >> not, so I can't do the parallelization on my own. >> >> So, the question is: is it technically possible to have parallel >> archivization, and would anyone be willing to work on it (sorry, my >> c skills are basically none, so I can't realistically hack it myself) > > Not entirely sure what the concern is around "postgresql knowing what is > archived", but pgbackrest already does exactly this parallel archiving > for environments where the WAL volume is larger than a single thread can > handle, and we've been rewriting it in C specifically to make it fast > enough to be able to keep PG up-to-date regarding what's been pushed > already.
To be clear, pgBackRest uses the .ready files in archive_status to parallelize archiving but still notifies PostgreSQL of completion via the archive_command mechanism. We do not modify .ready files to .done directly. However, we have optimized the C code to provide ~200 notifications/second (3.2GB/s of WAL transfer) which is enough to keep up with the workloads we have seen. Larger WAL segment sizes in PG11 will theoretically increase this to 200GB/s, though in practice CPU to do the compression will become a major bottleneck, not to mention network, etc. Regards, -- -David da...@pgmasters.net
signature.asc
Description: OpenPGP digital signature