I've had a look at this series without looking at the second series that extends this (yet). As Dano noted, there are multiple TODOs for the final ha-manager patch in this series that tries to persist the migration from groups to rules. During my testing I quickly ran into some problems with this patch. I noted some of the errors I encountered and how they could be fixed in the respective patch.
With that said, I tested the following on a 3-node cluster: - Migration of resource in when node with higher priority becomes available (tested also that no migration occurs if nofailback is active) - Stopping node the resource is on from outside correctly triggers recovery to another node - Resource is stopped if none of the nodes in the group restricted group is available - Same case as before, but node keeps running on node outside of the group in case of unrestricted group - Resource is migrated back once a node from its group comes online again if nofailback is not checked - Resource stays on current node even if its preferred node comes online again if nofailback is checked The in-memory migration seemed to work well, I did not really notice any problems there. The persistent migration failed due to the errors describes in ha-manager 15/15. I applied the changes I proposed in my comments to that patch and tested that the migration produced a correct rules.cfg. I tested the various possible group configurations and checked that the resulting rules.cfg contained the correct nodes and resources. I noticed the following regarding persistent migration: - The resources.cfg is empty after the migration. I have not yet been able to find out why, but I'll try to send a fix for that. - The nofailback flag is (due to above) not correctly migrated The migrations based on the persisted rules seemed to work (apart from failback). Of course, without the second series, the current state is that the groups are migrated and then are simply gone and there is nothing to replace them, so it's not usable with the persistent migration. I'll have a look at the second series as well and at the tests for this one. On 7/4/25 20:16, Daniel Kral wrote: > RFC v1: > https://lore.proxmox.com/pve-devel/20250325151254.193177-1-d.k...@proxmox.com/ > RFC v2: > https://lore.proxmox.com/pve-devel/20250620143148.218469-1-d.k...@proxmox.com/ > > I've separated the core HA Rules module and the transformation from HA > groups to HA Node Affinity rules (formerly known as HA Location rules) > in this patch series, to reduce the overhead for reviewers and strive > for a better version history, as changing two things at a time is rather > confusing. > > The main things that have changed since the last version (v2): > > - split up the patch series (ofc) > > - rebased on newest available master > > - renamed "HA Location Rule" to "HA Node Affinity Rule" > > - renamed any reference of a 'HA service' to 'HA resource' (e.g. rules > property 'services' is now 'resources') > > - converted tri-state property 'state' to a binary 'disable' flag on HA > rules and expose the 'contradictory' state with an 'errors' hash > > - remove the "use-location-rules" feature flag and implement a more > straightforward ha groups migration (more on that below) > > - remove any reference of ha groups from the web interface > > > > As before, HA groups are migrated to HA node affinity rules in each HA > Manager round where something has changed about the HA groups / HA > resources config file, but these are now unconditionally done as soon as > a HA Manager runs with that version. It will also try to persistently > migrate these, but that will only be successful as soon as all other > nodes are upgraded (i.e. every node can run at least the HA Manager > version that can successfully parse and apply the HA rules). > > > There are still some things left to do, which I didn't get the time to > come around to do for this revision: > > - Testing, testing, testing > > - I've ran out of time on the persistent HA groups migration part, which > has at least the two TODOs, which are mentioned in the patch itself, > and I haven't tested them on any real PVE upgrade yet; It's more of a > draft on how the migration should potentially work > > - Also, the last patch for the persistent HA groups migration part will > fail the tests but the two that have been added, because of the way > the other tests are designed; that should be abstracted away in the HA > environment, e.g., a routine "have_groups_been_migrated" for PVE2/Sim. > > - There might be a bit too many in-memory group migrations on the HA > Rules API side now, but better safe then sorry, maybe they can be > removed later; however, these shouldn't overwrite the rules that come > from the config, I haven't checked on that yet > > - Should the HA Groups API (and the HA Resources 'group' property in the > HA Resources API) be removed now? Or should these stay and uses of > them make auto-migrations to the HA Rules? > > As in the previous revisions, I've run a > > git rebase master --exec 'make clean && make deb' > > on the series, so the tests should work for every patch. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel