Hi, On 2020-04-15 11:57:29 -0400, Robert Haas wrote: > Over at > http://postgr.es/m/CADM=JehKgobEknb+_nab9179HzGj=9eitzwmod2mpqr_rif...@mail.gmail.com > there's a proposal for a parallel backup patch which works in the way > that I have always thought parallel backup would work: instead of > having a monolithic command that returns a series of tarballs, you > request individual files from a pool of workers. Leaving aside the > quality-of-implementation issues in that patch set, I'm starting to > think that the design is fundamentally wrong and that we should take a > whole different approach. The problem I see is that it makes a > parallel backup and a non-parallel backup work very differently, and > I'm starting to realize that there are good reasons why you might want > them to be similar. > > Specifically, as Andres recently pointed out[1], almost anything that > you might want to do on the client side, you might also want to do on > the server side. We already have an option to let the client compress > each tarball, but you might also want the server to, say, compress > each tarball[2]. Similarly, you might want either the client or the > server to be able to encrypt each tarball, or compress but with a > different compression algorithm than gzip. If, as is presently the > case, the server is always returning a set of tarballs, it's pretty > easy to see how to make this work in the same way on either the client > or the server, but if the server returns a set of tarballs in > non-parallel backup cases, and a set of tarballs in parallel backup > cases, it's a lot harder to see how that any sort of server-side > processing should work, or how the same mechanism could be used on > either the client side or the server side. > > So, my new idea for parallel backup is that the server will return > tarballs, but just more of them. Right now, you get base.tar and > ${tablespace_oid}.tar for each tablespace. I propose that if you do a > parallel backup, you should get base-${N}.tar and > ${tablespace_oid}-${N}.tar for some or all values of N between 1 and > the number of workers, with the server deciding which files ought to > go in which tarballs. This is more or less the naming convention that > BART uses for its parallel backup implementation, which, incidentally, > I did not write. I don't really care if we pick something else, but it > seems like a sensible choice. The reason why I say "some or all" is > that some workers might not get any of the data for a given > tablespace. In fact, it's probably desirable to have different workers > work on different tablespaces as far as possible, to maximize parallel > I/O, but it's quite likely that you will have more workers than > tablespaces. So you might end up, with pg_basebackup -j4, having the > server send you base-1.tar and base-2.tar and base-4.tar, but not > base-3.tar, because worker 3 spent all of its time on user-defined > tablespaces, or was just out to lunch.
One question I have not really seen answered well: Why do we want parallelism here. Or to be more precise: What do we hope to accelerate by making what part of creating a base backup parallel. There's several potential bottlenecks, and I think it's important to know the design priorities to evaluate a potential design. Bottlenecks (not ordered by importance): - compression performance (likely best solved by multiple compression threads and a better compression algorithm) - unencrypted network performance (I'd like to see benchmarks showing in which cases multiple TCP streams help / at which bandwidth it starts to help) - encrypted network performance, i.e. SSL overhead (not sure this is an important problem on modern hardware, given hardware accelerated AES) - checksumming overhead (a serious problem for cryptographic checksums, but presumably not for others) - file IO (presumably multiple facets here, number of concurrent in-flight IOs, kernel page cache overhead when reading TBs of data) I'm not really convinced that design addressing the more crucial bottlenecks really needs multiple fe/be connections. But that seems to be have been the focus of the discussion so far. Greetings, Andres Freund