■ Environment
●Cluster: Galera Cluster (3 nodes)
●OS: CentOS 7.4
●DBMS: MariaDB 10.6.15
●DB Uptime: 509 days

■ Issue Overview
●Time of Occurrence: Between 00:00 and 02:00
●Initial Symptom: Single-row INSERT and DELETE queries were delayed by several 
seconds and eventually stalled
●Around 00:34: Massive UPDATE queries (targeting same PK) led to X locks and an 
increase in active sessions
●00:35: CPU usage on DB server hit 100% and stayed at critical levels; thread 
count spiked
●00:41: Galera node DB01 shut down automatically

Error log excerpt:
[ERROR][FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold for dict_sys.latch 
was exceeded.
See : https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/

■ Root Cause (Internal Analysis)
●dict_sys.latch exceeded the innodb_fatal_semaphore_wait_threshold (default: 
600 seconds)
●This caused InnoDB to forcefully kill the MariaDB process
●The dict_sys.latch is a global latch for the InnoDB data dictionary, which can 
become a severe bottleneck under high concurrency

❗ What’s Unusual:
●No clear sign of typical row locks or massive spike in transaction volume
●Even single-row INSERT and DELETE queries were delayed by thousands of 
seconds, which is highly abnormal
●No obvious external factors (lock contention, CPU saturation, or connection 
floods) were identified
●Strong suspicion of internal engine behavior or a bug

❓ Questions and Request for Input
●Has anyone experienced a similar issue related to dict_sys.latch in Galera 
Cluster environments?
●Are there known bugs or release notes in MariaDB 10.6.x or Galera that mention 
severe delays or process termination related to this latch?
●Any known workarounds or best practices to prevent this from recurring?

Your experience and advice would be greatly appreciated.

Thank you in advance!
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to