[MariaDB discuss] Re: increasing InnoDB history length and stalls on MariaDB 11.4

Gordan Bobic via discuss Tue, 03 Dec 2024 03:22:08 -0800

On Tue, Dec 3, 2024 at 12:40 PM Kazuhiko via discuss
<[email protected]> wrote:
>
> Hello,
>
> Our application has a sustained rate of DML queries in parallel with
> DQL. We can achieve good performance using
> innodb-locks-unsafe-for-binlog on MariaDB 10.4, with careful
> consideration of its 'unsafeness'.
> But innodb-locks-unsafe-for-binlog is removed in MariaDB 10.5 or later
> and our application runs much slower without it because of gap locking.
> https://mariadb.com/docs/server/ref/mdb/cli/mariadbd/innodb-locks-unsafe-for-binlog/
>
> To achieve a good performance as before, we now use READ COMMITTED
> transaction isolation for DML and use REPEATABLE READ isolation for DQL
> on MariaDB 11.4. This way guarantees the consistency for our
> application.


Have you actually verified that your application needs
repeatable-read? It's possible but in my experience it is extremely
rare for applications to re-read the rows they already read.
Not to say it never happens, but I have seen application
implementations that actually needs this maybe twice out of thousands
of applications I've worked on over the last 25 years.

> Our database contains totally 13.7G rows and uses 4.5TB storage. Here is
> the configuration :
>
> innodb_buffer_pool_size = 17179869184 (16GB)
> innodb_buffer_pool_instances = 16 (no meaning in 11.4)
> innodb_log_file_size = 134217728 (128MB)
> innodb_flush_method = O_DIRECT
>
> Please see the attached graphs drawing 'InnoDB history length (right
> axis)' and 'free redo log in % (left axis)' (i.e. 100 - Redo Log
> Occupancy) while running our reference workload.
> https://mariadb.com/kb/en/innodb-redo-log/#determining-the-redo-log-occupancy
> These tests are done with the same workload, but only MariaDB version
> and/or configuration is different.
> (Please disregard the 'warehouse history length' and 'activities history
> length' series in those plots, they are not relevant to the current
> issue.)
>
> On MariaDB 10.4, the performance and the behaviour between 'using
> innodb-locks-unsafe-for-binlog' and 'using READ COMMITTED and REPEATABLE
> READ' are similar. But on MariaDB 11.4, I found two serious issues.
>
> Issue 1: InnoDB history length keeps growing while running lots of DML
> and DQL queries in parallel.
>
> InnoDB history length is stable around 5k on 10.4, but it keeps growing
> on 11.4 up to 100k until the end. In our experience, increases in
> history list length are roughly
> correlated with poor query performance (causing queries to "step over
> tombstones" ?), and we do our best to avoid having long-running
> transactions in order to allow older snapshots to be freed ASAP. We are
> satisfied with the status on 10.4, where history list is constant.
> What is the reason behind such increases on 11.4 and is there any
> configuration to avoid such ?

Have you been able to measure the performance impact of this? 100K
isn't usually high enough to start causing a measurable performance
degradation, that usually starts to happen north of 1M history length.

> Issue 2: MariaDB stalls when free redo log is too small.
>
> When MariaDB stalls, even a quite simple query takes too long.
>
> (table definition)
> CREATE TABLE `catalog` (
>    `uid` bigint(20) unsigned NOT NULL,
>    `path` varchar(255) NOT NULL DEFAULT '',
>    ...
>    PRIMARY KEY (`uid`),
>    KEY `Path` (`path`),
>    ...
> )
>
> (from pt-query-digest output)
> # Attribute    pct   total     min     max     avg     95%  stddev
> median
> # ============ === ======= ======= ======= ======= ======= =======
> =======
> # Count          2      91
> # Exec time      2    716s      1s     17s      8s     15s      5s
> 7s
> # Lock time      0    22ms   156us   491us   238us   366us    70us
> 204us
> # Rows sent      0     663       3      10    7.29    8.91    1.09
> 6.98
> # Rows examine   0     663       3      10    7.29    8.91    1.09
> 6.98
> # Rows affecte   0       0       0       0       0       0       0
> 0
> # Bytes sent     0  52.10k     330     749  586.24  685.39   68.46
> 563.87
> # Query size     0 470.83k   3.56k   5.26k   5.17k   5.20k  225.56
> 5.20k
> SELECT uid, path FROM catalog WHERE path in ('...', /*... omitted 80
> items ...*/)\G
> (here, these SELECT are issued in READ COMMITTED isolation level)
>
> Our understanding is that the stalls are caused by the undo log cleanup
> job, and/or the innodb buffer pool writeback job.
> Increasing the undo log size a lot seems to reduce the frequency of
> those stalls.
> For our reference workload here, we still have stalls with 1GB log file
> but we don't have stalls with 2GB or more log file.
> If we need 'big enough' log file size to avoid stalls, will the ideal
> size be proportional to the amount of data or proportional to the buffer
> pool size ?
> We have never had such stall on MariaDB 10.4 even with small 128MB log
> file. How this difference happen and is there any configuration to
> improve the behaviour ?

This looks like the furious flushing regression that was introduced in
10.5 and doesn't appear to have been fixed since.
It is possible to tune it out to a large extent, though:
https://shatteredsilicon.net/mariadb-furious-flushing-fix-checkpoint-age/
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB discuss] Re: increasing InnoDB history length and stalls on MariaDB 11.4

Reply via email to