Hi Kyle! Galera operates in such a way that every transaction is replicated to every node in the cluster and committed into storage engine before success is returned to the client. This means that when the client receives the acknowledgement, there is at least one node (the one on which the transaction was originally executed on) which has persisted the transaction. So theoretically in the scenario you described, at least one of the nodes should be able to recover all the acknowledged transactions when the cluster is restarted after full crash. The fact that an alternate timeline appears after the cluster restart suggests that all the committed transactions were not recovered by the storage engine on restart.
The jepsen-galera.cnf has the following InnoDB settings: # Performance related settings innodb_autoinc_lock_mode = 2 innodb_flush_log_at_trx_commit = 0 The documentation in https://mariadb.com/docs/server/server-usage/storage-engines/innodb/innodb-system-variables#innodb_flush_log_at_trx_commit states for value 0: "Nothing is done on commit; rather the log buffer is written and flushed to the InnoDB redo log once a second. This gives better performance, but a server crash can erase the last second of transactions." I'd suspect that this is the root cause of the data loss which produces the alternate timeline. I run the test with `innodb_flush_log_at_trx_commit = 1` (full durability) a few times and didn't observe any test failures. Another thing which caught my attention was having binlogs enabled (log-bin in resources/my.cnf) but log_slave_updates (https://mariadb.com/docs/server/ha-and-performance/standard-replication/replication-and-binary-log-system-variables#log_slave_updates) not enabled. This again may be a cause for some already acknowledged transactions to be lost during crash recovery. Also, if binlogs are enabled, the safest setting is to have `sync_binlog=1` (https://mariadb.com/docs/server/ha-and-performance/standard-replication/replication-and-binary-log-system-variables#sync_binlog). - Teemu Kyle Kingsbury wrote: > Dear MariaDB & Galera folks, > > I've been trying out MariaDB with Galera Cluster recently, and I keep > seeing what looks like write loss when nodes crash and restart. I was > wondering if anyone from the MariaDB or Galera teams might be interested > in taking a look at my cluster configuration and some logs, and helping > figure out what's going on? I've spent a lot of time reading the docs > and trying to set things up correctly, but it's definitely possible I'm > just Holding The Database Wrong (TM)! > > My workload performs randomly generated transactions which consist of a > series of read or append operations. Each operation reads or updates a > single row by primary key. Each row contains a TEXT field with a list of > comma-separated integers. The only writes in this workload append a > unique integer to one of these lists. In a Snapshot Isolated or > Repeatable Read system like MariaDB, all versions of a single row's > value should be prefices of the longest such value. > > This is true in single-node MariaDB, but is not true with Galera > replication. Instead, it appears that the effects of a few dozen > committed writes can be lost, then replaced by what appears to be an > alternate timeline" of different writes The attached image shows a > series of reads of key 112. Time flows top to bottom, and the list of > integers are shown after `txn`. The timeline ending in `...53, 56, 57, > 58, 71` is destroyed around 50 seconds into the test, and replaced by > `... 158, 159, ...`. This coincided with a crash and restart of all > three nodes in the cluster. > > This happens with MariaDB 12.1.2 and Galera 26.4.13 on Debian 12, using > the official MariaDB repositories for both MariaDB and Galera. The test > suite to reproduce this is at https://github.com/jepsen-io/mysql; use > commit 3500f8c80bd0f419d7f21a7b89eaf65f8651a7af, and try something like: > > lein run test-all --nodes n1,n2,n3 -w append --concurrency 6n --nemesis > kill --time-limit 300 --test-count 5 --isolation repeatable-read > --expected-consistency-model snapshot-isolation > > For an example failing case, including config files and the > error/general logs on each node, see: > > https://s3.amazonaws.com/jepsen.io/analyses/mariadb-galera-12.1.2/20260105T1... > > If anyone has ideas about what might be going on here, I'd love to hear > from you. :-) > > Cheers, > > --Kyle _______________________________________________ discuss mailing list -- [email protected] To unsubscribe send an email to [email protected]
