Re: design for parallel backup

Andres Freund Mon, 20 Apr 2020 13:20:09 -0700

Hi,

On 2020-04-15 11:57:29 -0400, Robert Haas wrote:
> Over at 
> http://postgr.es/m/CADM=JehKgobEknb+_nab9179HzGj=9eitzwmod2mpqr_rif...@mail.gmail.com
> there's a proposal for a parallel backup patch which works in the way
> that I have always thought parallel backup would work: instead of
> having a monolithic command that returns a series of tarballs, you
> request individual files from a pool of workers. Leaving aside the
> quality-of-implementation issues in that patch set, I'm starting to
> think that the design is fundamentally wrong and that we should take a
> whole different approach. The problem I see is that it makes a
> parallel backup and a non-parallel backup work very differently, and
> I'm starting to realize that there are good reasons why you might want
> them to be similar.
>
> Specifically, as Andres recently pointed out[1], almost anything that
> you might want to do on the client side, you might also want to do on
> the server side. We already have an option to let the client compress
> each tarball, but you might also want the server to, say, compress
> each tarball[2]. Similarly, you might want either the client or the
> server to be able to encrypt each tarball, or compress but with a
> different compression algorithm than gzip. If, as is presently the
> case, the server is always returning a set of tarballs, it's pretty
> easy to see how to make this work in the same way on either the client
> or the server, but if the server returns a set of tarballs in
> non-parallel backup cases, and a set of tarballs in parallel backup
> cases, it's a lot harder to see how that any sort of server-side
> processing should work, or how the same mechanism could be used on
> either the client side or the server side.
>
> So, my new idea for parallel backup is that the server will return
> tarballs, but just more of them. Right now, you get base.tar and
> ${tablespace_oid}.tar for each tablespace. I propose that if you do a
> parallel backup, you should get base-${N}.tar and
> ${tablespace_oid}-${N}.tar for some or all values of N between 1 and
> the number of workers, with the server deciding which files ought to
> go in which tarballs. This is more or less the naming convention that
> BART uses for its parallel backup implementation, which, incidentally,
> I did not write. I don't really care if we pick something else, but it
> seems like a sensible choice. The reason why I say "some or all" is
> that some workers might not get any of the data for a given
> tablespace. In fact, it's probably desirable to have different workers
> work on different tablespaces as far as possible, to maximize parallel
> I/O, but it's quite likely that you will have more workers than
> tablespaces. So you might end up, with pg_basebackup -j4, having the
> server send you base-1.tar and base-2.tar and base-4.tar, but not
> base-3.tar, because worker 3 spent all of its time on user-defined
> tablespaces, or was just out to lunch.


One question I have not really seen answered well:

Why do we want parallelism here. Or to be more precise: What do we hope
to accelerate by making what part of creating a base backup
parallel. There's several potential bottlenecks, and I think it's
important to know the design priorities to evaluate a potential design.

Bottlenecks (not ordered by importance):
- compression performance (likely best solved by multiple compression
  threads and a better compression algorithm)
- unencrypted network performance (I'd like to see benchmarks showing in
  which cases multiple TCP streams help / at which bandwidth it starts
  to help)
- encrypted network performance, i.e. SSL overhead (not sure this is an
  important problem on modern hardware, given hardware accelerated AES)
- checksumming overhead (a serious problem for cryptographic checksums,
  but presumably not for others)
- file IO (presumably multiple facets here, number of concurrent
  in-flight IOs, kernel page cache overhead when reading TBs of data)

I'm not really convinced that design addressing the more crucial
bottlenecks really needs multiple fe/be connections. But that seems to
be have been the focus of the discussion so far.

Greetings,

Andres Freund

Re: design for parallel backup

Reply via email to