[MariaDB discuss] Re: Redo log and tablespace flushing

Gordan Bobic via discuss Sun, 19 Nov 2023 07:43:11 -0800

On Sun, Nov 19, 2023 at 5:23 PM Kristian Nielsen
<[email protected]> wrote:
>
> Gordan Bobic via discuss <[email protected]> writes:
>
> > Thanks for this. Is there a way to force replay of the entire redo log on
> > an unclean shutdown even if the checkpoint in the redo log says it was
> > flushed to tablespace?
>
> This won't help you if the part of the redo log you need was overwritten by
> new records due to the cyclic nature of the redo log.


I am working here based on the assumption that the time taken to
overwrite the circular buffer (typically sized to absorb the peak
daily hour of writes) is going to be vastly greater than the amount of
data that could be lost to lying about tablespace sync (~5 seconds).
So unless something spectacularly anomalous happens, I think there
should be plenty of margin for error there.

> > I'm exploring the idea of running datadir on storage that preserves write
> > ordering but runs with the equivalent of nobarrier. It will still flush in
> > the background every X seconds where X is configurable, so I am hoping to
> > use the redo log to keep my data crash-safe even though I am lying about
> > tablespace write flushes, because write ordering will be preserved despite
> > running with the equivalent of nobarrier.
>
> If write-ordering is preserved (but it has to preserved between log writes
> and data writes as well), then you will be crash-safe, because the situation
> will be the same as if a full-durable system crashed X seconds ago. You will
> lose the last X seconds of commit, but data will be consistent, similar to
> --innodb-flush-log-at-trx-commit=2 (or 0).

Yes, I already do this on the slaves
(innodb_flush_log_at_trx_commit=1, sync_master_info=1) with storage
that preserves write ordering but lies about having committed to
stable storage. That part of the setup is pretty bulletproof. Slaves
just restart replicating from a point a few seconds before and
everything is consistent.

> What goal are you trying to achieve here? Some performance gains, or the
> ability to use main storage with some non-standard write semantics?

The objective is to gain a bit of performance on the master node where
being a few seconds behind after a dirty shutdown is not a palatable
option.

> You can configure InnoDB to have a huge redo log and perhaps there are also
> some options to reduce the frequency of checkpoints.

Well, the traditional rule of thumb has been to size the redo log to
absorb the daily peak hour of writes.
I [refer to tune it to be sized so that checkpoint age never gets too
close to the limit (log size).
Unfortunately, the latter option is impossible with the redo log
checkpointing changes since 10.5+ (it never flushes anything at all
until it reaches the high water mark), but that's for a different
conversation thread.

> That should in practice
> avoid the problem with needed redo log being overwritten.

That would still leave the edge case of a few seconds after it does
eventually write the checkpoint, would it not? I am effectively
looking at a case of "never write a checkpoint".

> But it's obviously
> not something that InnoDB was designed to support.

I don't go down rabbit holes like this because it's easy and everybody
does it. :-)
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB discuss] Re: Redo log and tablespace flushing

Reply via email to