Thanks so much for the detailed response. I won't pretend to understand
the details of all of it, but I'm taking away a couple big things:
1. It's a mutual deadlock that cannot resolve itself
2. It's (potentially, if this really is a match) fixed in the upcoming
10.6.13
Am I understanding that much correctly?
In the meantime, is there any potential work-around that can be applied
to existing DBs? I.E., some configuration setting or parameter to either
remove or at least reduce the potential for the issue? As a guess I
yesterday tried turning off parallel replication on the two systems that
it has crashed so far, but as the crash is not regular (I'm not able to
repeat it in a development system) I don't know if it helped at all.
Also, I'm concerned about my upstream DB's. So far it's only crashed
replicas, and only with that one query. But do I read you right that
this problem is not necessarily replication-related; that it could
happen on the primary, as well?
The biggest problem I have right now when it happens is that I can't
find any way to break out of it. I.E., I cannot kill the thread in
question and instead end up having to kill the server, which of course
is a huge pain point. Is there anything else I can do with it that is
not so drastic?
Thanks again,
Dan Ragle
On 4/14/2023 1:53 AM, Marko Mäkelä wrote:
Hi Dan,
I see one thread that is doing a re-entrant call to
btr_cur_pessimistic_delete() on a secondary index tree when purging
the history of a committed transaction (such as a DELETE or an UPDATE
of an indexed column). It matches the hang
https://jira.mariadb.org/browse/MDEV-29835 that was actually
introduced in MySQL 5.7 already and has been present in MariaDB Server
starting with 10.2.2.
The thread right below that is executing
btr_estimate_n_rows_in_range(), which was improved in
https://jira.mariadb.org/browse/MDEV-21136 in MariaDB 10.6.9. After
that change, we started to see much more InnoDB hangs. For the 10.6.12
release, I tried to fix MDEV-29835. Because I ran out of time, I fixed
only part of it, in https://jira.mariadb.org/browse/MDEV-30400. I have
the feeling that this partial fix made the hangs in the remaining
cases much more likely.
Yasufumi Kinoshita introduced in MySQL 5.7 a latch mode that sits
between exclusive (X) and shared (S), called SX by him, and called U
(Update) by me in https://jira.mariadb.org/browse/MDEV-24142. At most
one X or SX lock can be granted on an object at a time. While X locks
conflict with S locks, the SX lock allows any number of S locks to be
acquired concurrently.
The problem is lock order inversion because Yasufumi’s implementation
violates his own design constraints
https://dev.mysql.com/worklog/task/?id=6326 High Level Architecture. I
helped formulate those rules back then, but I was not otherwise
involved with the design, implementation or review of the change.
Unfortunately, one section heading "(2) When holding index->lock
SX-latch:" is missing.
The purge thread that is doing the re-entrant call to
btr_cur_pessimistic_delete() is holding an index tree SX-latch and
some leaf page latches. As part of the page merge, it has to access
some non-leaf pages on which it did not acquire latches upfront.
According to the design rules, this is the wrong order of acquiring
latches. The btr_estimate_n_rows_in_range() thread is holding an index
S-latch and following the correct order for that case.
Without having the output of "thread apply all backtrace full", I
cannot say for sure that this is a case of MDEV-29835, but I think
that it is extremely likely. Based on other cases that I have
analyzed, I expect that the btr_cur_pessimistic_delete() is holding a
page latch that btr_estimate_n_rows_in_range() is waiting for, and it
is waiting for a higher-level page latch that
btr_estimate_n_rows_in_range() is holding.
The simple fix to this would be to never use the index SX-lock mode,
and always escalate to exclusive locking. We actually tried that years
back in https://jira.mariadb.org/browse/MDEV-14637 but it would have
caused a significant performance regression. The upcoming quarterly
releases (within a month or so) includes a fix of MDEV-29835 that only
escalates to exclusively locking the index tree when it is really
needed. In debug builds, we have assertions that would fire if index
page latches are being acquired in the wrong order while not holding
an exclusive index latch. This fix was tested both for correctness
(lack of debug assertions) and performance.
This is not the only bug that is related to SX-locks.
https://jira.mariadb.org/browse/MDEV-29883 is another example.
Some users are successfully using a development snapshot that includes
the fix of MDEV-29835. In https://jira.mariadb.org/browse/MDEV-30481
you can find one example.
With best regards,
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help : https://help.launchpad.net/ListHelp