Dear all,
replying to my own question ;-) this document explains the rbd mirroring / journaling process more in details: https://pad.ceph.com/p/I-rbd_mirroring especially this part: on startup, replay journal from flush position Store journal metadata in journal header, to be more general * flush position * per-zone flush positions pointers to positions in the journal (object, offset) - one for each reader so we can tell how far we can trim - store trim pos in primary and secondary zones, so despite loss of primary dc we can tell who's most up to date => so apparently there is one pointer to position in the journal for each secondary images (journal reader) and also importantly one for the primary image (normally journal writer, but also reader during open / crash recovery) this apparently confirms that clients on the primary are not only writing to the journal (to support replication on secondary) but also actively reading from it after a crash to replay the latest IO's that were missing on primary image. also useful info: https://tracker.ceph.com/projects/ceph/wiki/RBD_-_Mirroring * * on open, replay recent journal operations * periodically update a journal position pointer in the rbd image header (to limit replays on open) and this: https://docs.ceph.com/en/pacific/rbd/rbd-mirroring/#force-image-resync If a split-brain event is detected by the rbd-mirror daemon, it will not attempt to mirror the affected image until corrected. cheers Francois Scheurer -- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41 44 466 60 10 mail: francois.scheu...@everyware.ch web: http://www.everyware.ch ________________________________ From: Scheurer François <francois.scheu...@everyware.ch> Sent: Tuesday, October 3, 2023 4:38:07 PM To: dilla...@redhat.com; ceph-users@ceph.io Subject: [ceph-users] is the rbd mirror journal replayed on primary after a crash? Hello Short question regarding journal-based rbd mirroring. ▪IO path with journaling w/o cache: a. Create an event to describe the update b. Asynchronously append event to journal object c. Asynchronously update image once event is safe d. Complete IO to client once update is safe [cf. https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring_0.pdf] If a client crashes between b. and c., is there a mechanism to replay the IO from the journal on the primary image? If not, then the primary and secondary images would get out-of-sync (because of the extra write(s) on secondary) and subsequent writes to the primary would corrupt the secondary. Is that correct? Cheers Francois Scheurer -- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41 44 466 60 10 mail: francois.scheu...@everyware.ch web: http://www.everyware.ch
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io