On 4/4/24 12:25, Jakub Wartak wrote: > On Thu, Apr 4, 2024 at 12:56 AM Tomas Vondra > <tomas.von...@enterprisedb.com> wrote: >> >> Hi, >> >> Here's a much more polished and cleaned up version of the patches, >> fixing all the issues I've been aware of, and with various parts merged >> into much more cohesive parts (instead of keeping them separate to make >> the changes/evolution more obvious). > > OK, so three runs of incrementalbackupstests - as stated earlier - > also passed with OK for v20240403 (his time even with > --enable-casserts) > > pg_combinebackup flags tested were: > 1) --copy-file-range --manifest-checksums=CRC32C > 2) --copy-file-range --manifest-checksums=NONE > 3) default, no flags (no copy-file-range) >
Thanks! >> I changed how I think about this a bit - I don't really see the CoW copy >> methods as necessary faster than the regular copy (even though it can be >> with (5)). I think the main benefits are the space savings, enabled by >> patches (1)-(3). If (4) and (5) get it, that's a bonus, but even without >> that I don't think the performance is an issue - everything has a cost. > > I take i differently: incremental backups without CoW fs would be clearly : > - inefficient in terms of RTO (SANs are often a key thing due to > having fast ability to "restore" the clone rather than copying the > data from somewhere else) > - pg_basebackup without that would be unusuable without space savings > (e.g. imagine daily backups @ 10+TB DWHs) > Right, although this very much depends on the backup scheme. If you only take incremental backups, and then also a full backup once in a while, the CoW stuff probably does not help much. The alignment (the only thing affecting basebackups) may allow deduplication, but that's all I think. If the scheme is more complex, and involves "merging" the increments into the full backup, then this does help a lot. It'd even be possible to cheaply clone instances this way, I think. But I'm not sure how often would people do that on the same volume, to benefit from the CoW. >> On 4/3/24 15:39, Jakub Wartak wrote: >>> On Mon, Apr 1, 2024 at 9:46 PM Tomas Vondra >>> <tomas.von...@enterprisedb.com> wrote: > [..] >> Thanks. Would be great if you could run this on the attached version of >> the patches, ideally for each of them independently, so make sure it >> doesn't get broken+fixed somewhere on the way. > > Those are semi-manual test runs (~30 min? per run), the above results > are for all of them applied at once. So my take is all of them work > each one does individually too. > Cool, thanks. > FWIW, I'm also testing your other offlist incremental backup > corruption issue, but that doesnt seem to be related in any way to > copy_file_range() patches here. > Yes, that's entirely independent, happens with master too. >>>> 2) prefetch >>>> ----------- >>> [..] >>>> I think this means we may need a "--prefetch" option, that'd force >>>> prefetching, probably both before pread and copy_file_range. Otherwise >>>> people on ZFS are doomed and will have poor performance. >>> >>> Right, we could optionally cover in the docs later-on various options >>> to get the performance (on XFS use $this, but without $that and so >>> on). It's kind of madness dealing with all those performance >>> variations. >>> >> >> Yeah, something like that. I'm not sure we want to talk too much about >> individual filesystems in our docs, because those things evolve over >> time too. > > Sounds like Wiki then. > > BTW, after a quick review: could we in 05 have something like common > value then (to keep those together via some .h?) > > #define BATCH_SIZE PREFETCH_TARGET ? > Yes, that's one of the things I'd like to refine a bit more. Making it more consistent / clearer that these things are interdependent. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company