Dear DRBD(-reactor) users, this is the first release candidate of version 0.5.0
Besides minor fixes for Ubuntu Bionic, and some upgrades for containers, the main feature is proper demote failure handling in the promoter plugin. There was the "on-stop-failure" action, which at one point worked, but did not do anything since we switched to a more fancy systemd.target logic. What we really care about when managing services (and a potential fail-over because of a service failure) is if things are stopped in a way that the DRBD device can be demoted to secondary. If not, we might need to halt or reboot the node so that another node can take over the DRBD resource and the services depending ot it. This is done via the new setting "on-drbd-demote-failure". "on-stop-failure" is deprecated and ignored. The new option can be set to any action defined for "FailureAction" as defined in systemd.unit(5). If the DRBD resource can not be demoted, that action is executed. Let's see how that looks like in a HA cluster providing a file system mount. I assume a working linstor cluster (while not strictly required). A good way to help us testing is using the PPA. If you are using fresh VMs, make sure that you restart multipathd after the first drbd-utils install. So, let's assume a 3 node cluster, which also has this RC of drbd-reactor installed: Let's create a 3 node DRBD resource: $ linstor rg c --place-count 3 promoter $ linstor rg drbd-options promoter --auto-promote no $ linstor rg drbd-options promoter --quorum majority $ linstor rg drbd-options promoter --on-no-quorum io-error $ linstor vg c promoter $ linstor rg spawn promoter test 20M And a file system: $ drbdadm primary test $ mkfs.ext4 /dev/drbd1000 $ drbdadm secondary test And a mount unit for the storage: on *all* nodes: $ cat <<EOF > /etc/systemd/system/mnt-test.mount [Unit] Description=Mount /dev/drbd1000 to /mnt/test [Mount] What=/dev/drbd1000 Where=/mnt/test Type=ext4 EOF And a simple drbd-reactor::promoter config: on *all* nodes: $ cat <<EOF > /etc/drbd-reactor.d/mnt-test.toml [[promoter]] id = "mnt-test" [promoter.resources.test] start = ["mnt-test.mount"] on-drbd-demote-failure = "reboot" EOF on *all* nodes: systemctl start drbd-reactor Then you can check which node is Primary and has the device mounted: $ drbd-reactorctl status mnt-test On the node that is Primary you can do a switch-over, just for testing: $ drbd-reactorctl disable --now mnt-test $ # another node should be primary now and have the FS mounted $ drbd-reactorctl enable mnt-test # to re-enable the config again Testing demote failure. Connect to the node that is Primary $ touch /mnt/test/lock $ sleep 3600 < /mnt/test/lock & $ # ^^ this creates an opener and the mount unit will be unable to stop $ # and the DRBD device will be unable to demote $ systemctl restart drbd-services@test.target # trigger a stop/restart of the target This should trigger the reboot action and another node should take over the mount. Please help testing. Regards, rck GIT: https://github.com/LINBIT/drbd-reactor/commit/3df014d63611b97470728838afdaf2d313f0d786 TGZ: https://pkg.linbit.com//downloads/drbd/utils/drbd-reactor-0.5.0-rc.1.tar.gz PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack/
signature.asc
Description: PGP signature
_______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user