On Thu, 22 Jul 2021 15:36:03 +0200 "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>> Jehan-Guillaume de Rorthais <j...@dalibo.com> schrieb am 22.07.2021 um > 12:05 in > Nachricht <20210722120537.0d65c2a1@firost>: > > On Wed, 21 Jul 2021 22:02:21 -0400 > > "Frank D. Engel, Jr." <fde...@fjrhome.net> wrote: > > > >> In OpenVMS, the kernel is aware of the cluster. As is mentioned in that > >> presentation, it actually stops processes from running and blocks access > >> to clustered storage when quorum is lost, and resumes them appropriately > >> once it is re-established. > >> > >> In other words... no reboot, no "death" of the cluster node or special > >> arrangements with storage hardware... If connectivity is restored, the > >> services are simply resumed. > > > > Well, when losing the quorum, by default Pacemaker stop its local > resources. > > But when a node without quorum performs any actions it may corrupt data (e.g. > writing to a non-shared filesystem like ext3 on a shared medium like iSCSI or > FC_SAN). In the case you are describing, the storage itself should forbid the situation where a non shared filesystem could be mounted in multiple server in the same time. If you can't do this on the storage side, the simplest way to do it is using the lvm systemid restriction (lvmsystemid(7)). This restriction strictly allows 0 or 1 node to access the shared VG. The name of the node allowed to activate the VG is written on storage side. LVM will fails on any other node trying to activate the shared VG. There's a pacemaker agent taking care of this. I did some PoC using this, this is really easy to manage. But I suspect OP is talking about a distributed clustered FS anyway, so this is a completely different beast I never dealt with... Regards, _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/