[MariaDB discuss] Re: Lost writes with Galera cluster?

Gordan Bobic via discuss Tue, 06 Jan 2026 00:35:04 -0800

Did you get confirmation on the connection doing the writing that the
write succeeded? Could it be that the write failed and was reported to
fail? Galera guarantees that you won't lose a write that was
committed. If your test is generating writes and the node goes away
you'll get errors back and it is up to the application to retry those
writes.


On Tue, 6 Jan 2026 at 10:21, Kyle Kingsbury via discuss
<[email protected]> wrote:
>
> Dear MariaDB & Galera folks,
>
> I've been trying out MariaDB with Galera Cluster recently, and I keep
> seeing what looks like write loss when nodes crash and restart. I was
> wondering if anyone from the MariaDB or Galera teams might be interested
> in taking a look at my cluster configuration and some logs, and helping
> figure out what's going on? I've spent a lot of time reading the docs
> and trying to set things up correctly, but it's definitely possible I'm
> just Holding The Database Wrong (TM)!
>
> My workload performs randomly generated transactions which consist of a
> series of read or append operations. Each operation reads or updates a
> single row by primary key. Each row contains a TEXT field with a list of
> comma-separated integers. The only writes in this workload append a
> unique integer to one of these lists. In a Snapshot Isolated or
> Repeatable Read system like MariaDB, all versions of a single row's
> value should be prefices of the longest such value.
>
> This is true in single-node MariaDB, but is not true with Galera
> replication. Instead, it appears that the effects of a few dozen
> committed writes can be lost, then replaced by what appears to be an
> alternate timeline" of different writes The attached image shows a
> series of reads of key 112. Time flows top to bottom, and the list of
> integers are shown after `txn`. The timeline ending in `...53, 56, 57,
> 58, 71` is destroyed around 50 seconds into the test, and replaced by
> `... 158, 159, ...`. This coincided with a crash and restart of all
> three nodes in the cluster.
>
> This happens with MariaDB 12.1.2 and Galera 26.4.13 on Debian 12, using
> the official MariaDB repositories for both MariaDB and Galera. The test
> suite to reproduce this is at https://github.com/jepsen-io/mysql; use
> commit 3500f8c80bd0f419d7f21a7b89eaf65f8651a7af, and try something like:
>
> lein run test-all --nodes n1,n2,n3 -w append --concurrency 6n --nemesis
> kill --time-limit 300 --test-count 5 --isolation repeatable-read
> --expected-consistency-model snapshot-isolation
>
> For an example failing case, including config files and the
> error/general logs on each node, see:
>
> https://s3.amazonaws.com/jepsen.io/analyses/mariadb-galera-12.1.2/20260105T175818-lost-writes.zip
>
> If anyone has ideas about what might be going on here, I'd love to hear
> from you. :-)
>
> Cheers,
>
> --Kyle_______________________________________________
> discuss mailing list -- [email protected]
> To unsubscribe send an email to [email protected]



-- 
Gordan Bobic
Database Specialist, Shattered Silicon Ltd.
https://shatteredsilicon.net
Follow us:
LinkedIn: https://www.linkedin.com/company/shatteredsilicon
X: https://x.com/ssiliconbg
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB discuss] Re: Lost writes with Galera cluster?

Reply via email to