[DRBD-user] drbd-reactor v0.9.0
Dear DRBD users, this is drbd-reactor version 0.9.0. No changes since the last RC, the original announcement for convenience: The main new feature is that the promoter plugin can now freeze the services of the currently active node when it loses quorum and then thaw them when the node gains quorum again. This might be an advantage when starting services takes a long time (e.g., huge databases). Freezing and thawing is instant and uses the according cgroup features via systemctl freeze/thaw. Copying from the documentation [1]: The default behavior when a DRBD Primary looses quorum is to immediately stop the generated target unit and hope that other nodes still having quorum will successfully start the service. This works well if services can be failed over/started on another node in reasonable time. Unfortunately there are services that take a very long time to start, for example huge data bases. When a DRBD Primary looses its quorum we basically have two possibilities: - the rest of the nodes, or at least parts of it still have quorum: Then these have to start the service, they are the only ones with quorum, but still we could keep the old Primary in a frozen state. And then, when the nodes with quorum come into contact with the old Primary, then it should stop the service and its storage should become in sync with the other nodes. - the rest of the nodes are not able to form a partition with quorum. In such a scenario there are no alternatives anyways, we would need to keep the Primary frozen. But if the nodes eventually join the old Primary again, and quorum would be restored, we could just unfreeze/thaw the old Primary (which is also the new Primary). There are several requirements for this to work properly: - A system with unified cgroups. If the file /sys/fs/cgroup/cgroup.controllers exists you should be fine. That requires a relatively "new" kernel. Note that "even" RHEL8 for example needs the addition of systemd.unified_cgroup_hierarchy on the kernel command line. - a service that can tolerate to be frozen - DRBD option on-suspended-primary-outdated set to force-secondary - DRBD option on-quorum-loss set to suspend-io - DRBD net option rr-conflict set to retry-connect If these requirements are fulfilled, then one can set the promoter option "on-quorum-loss" to "freeze". It is a feature that might be handy in specific situations, the more classic behavior of stopping the services might be the better default for most users. Also, and that is important in general, check the output of "systemctl status drbd-reactor.service", it runs all kinds of DRBD option checks on your DRBD resources and tells you which options are missing/wrong. Follow these suggestions! Regards, rck GIT: https://github.com/LINBIT/drbd-reactor/commit/b9639b431f6d6e0fcc53a2a17d85717acb29d43e TGZ: https://pkg.linbit.com//downloads/drbd/utils/drbd-reactor-0.9.0.tar.gz PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack Changelog: [ Roland Kammerer ] * doc: make man pages o+r * docs,promoter: hint to use provided packages * promoter: warn if mount unit is topmost unit * promoter: implement on-quorum-loss policy * promoter: relax ocf parser * ctl: add resource filter * ctl: fix status without res filter * promoter: call systemctl freeze/thaw for every unit [1] https://github.com/LINBIT/drbd-reactor/blob/master/doc/promoter.md#freezing-resources signature.asc Description: PGP signature ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] drbd-reactor v0.9.0-rc.3
Dear DRBD users, this is RC3 of the upcoming drbd-reactor version 0.9.0. There was only one commit in the promoter plugin that now does not run "systemctl freeze" with all services as arguments in one call, but calls freeze for every service individually. That allows is to filter service we don't want freeze actions like in mount units and we can write better warn/error messages if we split that up into several calls. No big change, and only visible if one enables freezing, but sill a change I want people give the chance to test before the final version. Regards, rck GIT: https://github.com/LINBIT/drbd-reactor/commit/183ba9ff477536e7e3acdcec41d97a9fe4153cad TGZ: https://pkg.linbit.com//downloads/drbd/utils/drbd-reactor-0.9.0-rc.3.tar.gz PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack signature.asc Description: PGP signature ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] drbd-reactor v0.9.0-rc.2
Dear DRBD users, this is RC2 for drbd-reactor version 0.9.0. You did not miss RC1, I found a stupid bug in RC1 when I rolled it out on one of our internal clusters... The main new feature is that the promoter plugin can now freeze the services of the currently active node when it loses quorum and then thaw them when the node gains quorum again. This might be an advantage when starting services takes a long time (e.g., huge databases). Freezing and thawing is instant and uses the according cgroup features via systemctl freeze/thaw. Copying from the documentation [1]: The default behavior when a DRBD Primary looses quorum is to immediately stop the generated target unit and hope that other nodes still having quorum will successfully start the service. This works well if services can be failed over/started on another node in reasonable time. Unfortunately there are services that take a very long time to start, for example huge data bases. When a DRBD Primary looses its quorum we basically have two possibilities: - the rest of the nodes, or at least parts of it still have quorum: Then these have to start the service, they are the only ones with quorum, but still we could keep the old Primary in a frozen state. And then, when the nodes with quorum come into contact with the old Primary, then it should stop the service and its storage should become in sync with the other nodes. - the rest of the nodes are not able to form a partition with quorum. In such a scenario there are no alternatives anyways, we would need to keep the Primary frozen. But if the nodes eventually join the old Primary again, and quorum would be restored, we could just unfreeze/thaw the old Primary (which is also the new Primary). There are several requirements for this to work properly: - A system with unified cgroups. If the file /sys/fs/cgroup/cgroup.controllers exists you should be fine. That requires a relatively "new" kernel. Note that "even" RHEL8 for example needs the addition of systemd.unified_cgroup_hierarchy on the kernel command line. - a service that can tolerate to be frozen - DRBD option on-suspended-primary-outdated set to force-secondary - DRBD option on-quorum-loss set to suspend-io - DRBD net option rr-conflict set to retry-connect If these requirements are fulfilled, then one can set the promoter option "on-quorum-loss" to "freeze". Consider this as an experimental feature, and most users probably should not enable that by default. It is a feature that might be handy in specific situations, the more classic behavior of stopping the services might be the better default for most users. Also, and that is important in general, check the output of "systemctl status drbd-reactor.service", it runs all kinds of DRBD option checks on your DRBD resources and tells you which options are missing/wrong. Follow these suggestions! Regards, rck GIT: https://github.com/LINBIT/drbd-reactor/commit/a5cf59bf84359f0095154e8db78489c627f968ad TGZ: https://pkg.linbit.com//downloads/drbd/utils/drbd-reactor-0.9.0-rc.2.tar.gz PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack Changelog: [ Roland Kammerer ] * doc: make man pages o+r * docs,promoter: hint to use provided packages * promoter: warn if mount unit is topmost unit * promoter: implement on-quorum-loss policy * promoter: relax ocf parser * ctl: add resource filter * ctl: fix status without res filter [1] https://github.com/LINBIT/drbd-reactor/blob/master/doc/promoter.md#freezing-resources signature.asc Description: PGP signature ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user