On Wed, Feb 4, 2026 at 9:53 AM Gordan Bobic <[email protected]> wrote:
> I was actually thinking about it in terms of simply using block
> cloning as a faster way to copy because it doesn't have to actually
> copy the block. I was sort of assuming that this block-cloning would
> be copied-on-write for future writes (a bit like FL-COW:
> http://www.xmailserver.org/flcow.html ).
> If you can make the copy orders of magnitude faster by skipping the
> copy and only COW-ing blocks on future writes on the original, then
> WAL roll-over is that much less likely to happen.

That would not help users of the ext4 file system, which could be a
significant portion of our users. On ext4, copy_file_range() would
actually copy the entire file, which could be significantly more than
what is actually needed. In the circular format, we will not know
where the log ends until we have parsed all blocks to find a
discontinuity.

> Just out of interest - why not do the extra WALs purely in
> mariabackup? When it detects that the redo log has come full circle,
> simply write more on the backup side, and replay those additional WALs
> during --prepare?

That is exactly how it currently works. One thread parses the server's
ib_logfile0 and appends records to a special append-only ib_logfile0
that will be part of the backup. The size of the ib_logfile0 in the
backup depends on how much log had been written between the latest
checkpoint at the start of the backup and the LSN that determines the
logical snapshot at the completion of the backup.

The parsing is very expensive, because MDEV-14425 did away with the
512-byte log block structure. When mariadb-backup finds a
discontinuity in the log, it will keep re-reading and re-parsing from
that point on, hoping to find more log records that would be written
by the server. It is possible that the circular ib_logfile0 will be
overwritten several times by the server while mariadb-backup is
running. Sometimes it will result in an error that suggests that the
innodb_log_file_size should be made larger, sometimes it succeeds.
This is why a straightforward block cloning of the current circular
ib_logfile0 will not work.

> Right, but then you end up with multiple tars which have to mesh
> together which could be difficult, and xbstream/mbstream already does
> all this well.

Implementing multiple tar streams that cover a disjoint set of files
should be rather straightforward. Each thread would pick up the next
file in a round-robin fashion, write the tar header, and then append
the contents of the file to the stream, while blocking concurrent
writes to the portion of the file that is being streamed. I think that
it should scale well unless you have a single huge file, which would
make one huge stream finish much after the others.

> Doesn't xbstream/mbstream format already implement enough stream
> interleaving for parallel processing to be "good enough"?

I have noted in https://jira.mariadb.org/browse/MDEV-38362 that the
mbstream tool uses an excessive amount of system CPU time and is
significantly slower than a straight copy. The user CPU time could be
less if mbstream did not compute a CRC-32 on each block of payload
(there already are CRC on the input files as well on the output,
computed by any compression program). I did not investigate how to
reduce the system CPU time consumption. In any case, more threads is
not always merrier; frequent context switches tend to kill the
performance. Some synchronization is unavoidable if everything needs
to be serialized into a single stream.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB plc
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to