Migrate the HA groups config to the HA resources and HA rules config
persistently on disk and retry until it succeeds. The HA group config is
already migrated in the HA Manager in-memory, but to persistently use
them as HA node affinity rules, they must be migrated to the HA rules
config.

As the new 'failback' flag can only be read by newer HA Manager versions
and the rules config cannot be read by older HA Manager versions, these
can only be migrated (for the HA resources config) and deleted (for the
HA groups config) if all nodes are upgraded to the correct pve-manager
version, which has a version dependency on the ha-manager package, which
can read and apply the HA rules.

If the HA group migration fails, it is retried every 10 rounds.

Signed-off-by: Daniel Kral <d.k...@proxmox.com>
---
This patch must be updated with the correct pve-manager version, which
the HA Manager must check for before fully migrating (i.e. deleting the
rules config, etc.).

I guessed pve-manager 9.0.0 for now, but let's see what it'll be.

The version resolution includes the major, minor and patch version part
but not any other addtions (e.g. ~12).

 src/PVE/HA/Config.pm                         |   5 +
 src/PVE/HA/Env.pm                            |  24 +++
 src/PVE/HA/Env/PVE2.pm                       |  29 ++++
 src/PVE/HA/Manager.pm                        | 156 +++++++++++++++++++
 src/PVE/HA/Sim/Env.pm                        |  30 ++++
 src/PVE/HA/Sim/Hardware.pm                   |  24 +++
 src/test/test-group-migrate1/README          |   4 +
 src/test/test-group-migrate1/cmdlist         |   3 +
 src/test/test-group-migrate1/groups          |   7 +
 src/test/test-group-migrate1/hardware_status |   5 +
 src/test/test-group-migrate1/log.expect      | 101 ++++++++++++
 src/test/test-group-migrate1/manager_status  |   1 +
 src/test/test-group-migrate1/service_config  |   5 +
 src/test/test-group-migrate2/README          |   3 +
 src/test/test-group-migrate2/cmdlist         |   3 +
 src/test/test-group-migrate2/groups          |   7 +
 src/test/test-group-migrate2/hardware_status |   5 +
 src/test/test-group-migrate2/log.expect      |  50 ++++++
 src/test/test-group-migrate2/manager_status  |   1 +
 src/test/test-group-migrate2/service_config  |   5 +
 20 files changed, 468 insertions(+)
 create mode 100644 src/test/test-group-migrate1/README
 create mode 100644 src/test/test-group-migrate1/cmdlist
 create mode 100644 src/test/test-group-migrate1/groups
 create mode 100644 src/test/test-group-migrate1/hardware_status
 create mode 100644 src/test/test-group-migrate1/log.expect
 create mode 100644 src/test/test-group-migrate1/manager_status
 create mode 100644 src/test/test-group-migrate1/service_config
 create mode 100644 src/test/test-group-migrate2/README
 create mode 100644 src/test/test-group-migrate2/cmdlist
 create mode 100644 src/test/test-group-migrate2/groups
 create mode 100644 src/test/test-group-migrate2/hardware_status
 create mode 100644 src/test/test-group-migrate2/log.expect
 create mode 100644 src/test/test-group-migrate2/manager_status
 create mode 100644 src/test/test-group-migrate2/service_config

diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 424a6e10..92d04443 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -234,6 +234,11 @@ sub read_group_config {
     return cfs_read_file($ha_groups_config);
 }
 
+sub delete_group_config {
+
+    unlink "/etc/pve/$ha_groups_config" or die "failed to remove group config: 
$!\n";
+}
+
 sub write_group_config {
     my ($cfg) = @_;
 
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 70e39ad4..e00272a0 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -100,6 +100,12 @@ sub update_service_config {
     return $self->{plug}->update_service_config($sid, $param, $delete);
 }
 
+sub write_service_config {
+    my ($self, $conf) = @_;
+
+    $self->{plug}->write_service_config($conf);
+}
+
 sub parse_sid {
     my ($self, $sid) = @_;
 
@@ -137,12 +143,24 @@ sub read_rules_config {
     return $self->{plug}->read_rules_config();
 }
 
+sub write_rules_config {
+    my ($self, $rules) = @_;
+
+    $self->{plug}->write_rules_config($rules);
+}
+
 sub read_group_config {
     my ($self) = @_;
 
     return $self->{plug}->read_group_config();
 }
 
+sub delete_group_config {
+    my ($self) = @_;
+
+    $self->{plug}->delete_group_config();
+}
+
 # this should return a hash containing info
 # what nodes are members and online.
 sub get_node_info {
@@ -288,4 +306,10 @@ sub get_static_node_stats {
     return $self->{plug}->get_static_node_stats();
 }
 
+sub get_node_version {
+    my ($self, $node) = @_;
+
+    return $self->{plug}->get_node_version($node);
+}
+
 1;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 854c8942..78ce5616 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -141,6 +141,12 @@ sub update_service_config {
     return PVE::HA::Config::update_resources_config($sid, $param, $delete);
 }
 
+sub write_service_config {
+    my ($self, $conf) = @_;
+
+    return PVE::HA::Config::write_resources_config($conf);
+}
+
 sub parse_sid {
     my ($self, $sid) = @_;
 
@@ -201,12 +207,24 @@ sub read_rules_config {
     return PVE::HA::Config::read_and_check_rules_config();
 }
 
+sub write_rules_config {
+    my ($self, $rules) = @_;
+
+    PVE::HA::Config::write_rules_config($rules);
+}
+
 sub read_group_config {
     my ($self) = @_;
 
     return PVE::HA::Config::read_group_config();
 }
 
+sub delete_group_config {
+    my ($self) = @_;
+
+    PVE::HA::Config::delete_group_config();
+}
+
 # this should return a hash containing info
 # what nodes are members and online.
 sub get_node_info {
@@ -489,4 +507,15 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_node_version {
+    my ($self, $node) = @_;
+
+    my $version_info = PVE::Cluster::get_node_kv('version-info', $node);
+    return undef if !$version_info->{$node};
+
+    my $node_version_info = eval { decode_json($version_info->{$node}) };
+
+    return $node_version_info->{version};
+}
+
 1;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 43572531..374700cf 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -39,6 +39,8 @@ use PVE::HA::Usage::Static;
 # patches for changing above, as that set is mostly sensible and should be 
easy to remember once
 # spending a bit time in the HA code base.
 
+my $group_migration_cooldown = 6;
+
 sub new {
     my ($this, $haenv) = @_;
 
@@ -50,6 +52,7 @@ sub new {
         last_rules_digest => '',
         last_groups_digest => '',
         last_services_digest => '',
+        group_migration_round => 3, # wait a little bit
     }, $class;
 
     my $old_ms = $haenv->read_manager_status();
@@ -464,6 +467,157 @@ sub update_crm_commands {
 
 }
 
+my $have_groups_been_migrated = sub {
+    my ($haenv) = @_;
+
+    my $groups = $haenv->read_group_config();
+
+    return 1 if !$groups;
+    return keys $groups->{ids}->%* < 1;
+};
+
+my $get_version_parts = sub {
+    my ($node_version) = @_;
+
+    return $node_version =~ m/^(\d+)\.(\d+)\.(\d+)/;
+};
+
+my $has_node_min_version = sub {
+    my ($node_version, $min_version) = @_;
+
+    my ($major, $minor, $patch) = $get_version_parts->($node_version);
+    my ($min_major, $min_minor, $min_patch) = 
$get_version_parts->($min_version);
+
+    return 0 if $major < $min_major;
+    return 0 if $major == $min_major && $minor < $min_minor;
+    return 0 if $major == $min_major && $minor == $min_minor && $patch < 
$min_patch;
+
+    return 1;
+};
+
+my $is_lrm_active_or_idle = sub {
+    my ($ss, $node, $lrm_state) = @_;
+
+    my $active_count = 0;
+    for my $sid (sort keys %$ss) {
+        my $sd = $ss->{$sid};
+        next if $sd->{node} ne $node;
+        my $req_state = $sd->{state};
+        next if !defined($req_state);
+        next if $req_state eq 'stopped';
+        next if $req_state eq 'freeze';
+        $active_count++;
+    }
+
+    return 1 if $lrm_state eq 'active';
+    return 1 if $lrm_state eq 'wait_for_agent_lock' && !$active_count;
+
+    return 0;
+};
+
+my $assert_cluster_can_migrate_ha_groups = sub {
+    my ($haenv, $ns, $ss) = @_;
+
+    # NOTE pve-manager has a version dependency on the ha-manager which 
supports HA rules
+    # FIXME Set the actual minimum version which depends on the correct 
ha-manager version
+    my $HA_RULES_MINVERSION = "9.0.0";
+
+    die "cluster is not quorate\n" if !$haenv->quorate();
+
+    my $nodelist = $ns->list_nodes();
+    die "node list is empty\n" if !@$nodelist;
+
+    for my $node (@$nodelist) {
+        die "node with empty name\n" if !$node;
+
+        my $node_status = $ns->get_node_state($node);
+        $haenv->log(
+            'notice', "ha groups migration: node '$node' is in state 
'$node_status'",
+        );
+        die "node '$node' is not online\n" if $node_status ne 'online';
+
+        my ($lrm_state, $lrm_mode) = 
$haenv->read_lrm_status($node)->@{qw(state mode)};
+        die "could not retrieve state for lrm '$node'\n" if !$lrm_state || 
!$lrm_mode;
+        $haenv->log(
+            'notice',
+            "ha groups migration: lrm '$node' is in state '$lrm_state' and 
mode '$lrm_mode'",
+        );
+        die "lrm '$node' is not in state 'active' or 'idle'\n"
+            if !$is_lrm_active_or_idle->($ss, $node, $lrm_state);
+        die "lrm '$node' is not in mode 'active'\n" if $lrm_mode ne 'active';
+
+        my $node_version = $haenv->get_node_version($node);
+        die "could not retrieve version from node '$node'\n" if !$node_version;
+        $haenv->log(
+            'notice', "ha groups migration: node '$node' has version 
'$node_version'",
+        );
+        my $has_min_version = $has_node_min_version->($node_version, 
$HA_RULES_MINVERSION);
+        die "node '$node' needs at least pve-manager version 
'$HA_RULES_MINVERSION'\n"
+            if !$has_min_version;
+    }
+};
+
+my $migrate_group_persistently = sub {
+    my ($haenv, $ns, $ss) = @_;
+
+    eval {
+        $assert_cluster_can_migrate_ha_groups->($haenv, $ns, $ss);
+
+        my $resources = $haenv->read_service_config();
+        my $groups = $haenv->read_group_config();
+        my $rules = $haenv->read_rules_config();
+
+        # write changes to rules config whenever possible so users can modify 
migrated rules
+        PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $resources);
+        $haenv->write_rules_config($rules);
+        $haenv->log('notice', "ha groups migration: migration to rules config 
successful");
+
+        PVE::HA::Groups::migrate_groups_to_resources($groups, $resources);
+        for my $sid (keys %$resources) {
+            my $param = { failback => $resources->{$sid}->{failback} };
+
+            $haenv->update_service_config($sid, $param, 'group');
+        }
+        $haenv->log('notice', "ha groups migration: migration to resources 
config successful");
+
+        $haenv->delete_group_config();
+        $haenv->log('notice', "ha groups migration: group config deletion 
successful");
+    };
+    if (my $err = $@) {
+        $haenv->log('err', "abort ha groups migration: $err");
+        return 0;
+    }
+
+    return 1;
+};
+
+# TODO PVE 10: Remove group migration when HA groups have been fully migrated 
to rules
+sub try_persistent_group_migration {
+    my ($self) = @_;
+
+    my ($haenv, $ns, $ss) = ($self->{haenv}, $self->{ns}, $self->{ss});
+
+    return if $have_groups_been_migrated->($haenv);
+
+    $self->{group_migration_round}--;
+    return if $self->{group_migration_round} > 0;
+    $self->{group_migration_round} = $group_migration_cooldown;
+
+    $haenv->log('notice', "start ha group migration...");
+
+    if ($migrate_group_persistently->($haenv, $ns, $ss)) {
+        $haenv->log('notice', "ha groups migration successful");
+    } else {
+        $haenv->log('err', "ha groups migration failed");
+        $haenv->log(
+            'notice',
+            "retry ha groups migration in $group_migration_cooldown rounds (~ "
+                . $group_migration_cooldown * 10
+                . " seconds)",
+        );
+    }
+}
+
 sub manage {
     my ($self) = @_;
 
@@ -481,6 +635,8 @@ sub manage {
 
     $self->update_crs_scheduler_mode();
 
+    $self->try_persistent_group_migration();
+
     my ($sc, $services_digest) = $haenv->read_service_config();
 
     $self->{groups} = $haenv->read_group_config(); # update
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index 528ea3f8..fab270c1 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -215,6 +215,14 @@ sub update_service_config {
     return $self->{hardware}->update_service_config($sid, $param, $delete);
 }
 
+sub write_service_config {
+    my ($self, $conf) = @_;
+
+    $assert_cfs_can_rw->($self);
+
+    $self->{hardware}->write_service_config($conf);
+}
+
 sub parse_sid {
     my ($self, $sid) = @_;
 
@@ -259,6 +267,14 @@ sub read_rules_config {
     return $self->{hardware}->read_rules_config();
 }
 
+sub write_rules_config {
+    my ($self, $rules) = @_;
+
+    $assert_cfs_can_rw->($self);
+
+    $self->{hardware}->write_rules_config($rules);
+}
+
 sub read_group_config {
     my ($self) = @_;
 
@@ -267,6 +283,14 @@ sub read_group_config {
     return $self->{hardware}->read_group_config();
 }
 
+sub delete_group_config {
+    my ($self) = @_;
+
+    $assert_cfs_can_rw->($self);
+
+    $self->{hardware}->delete_group_config();
+}
+
 # this is normally only allowed by the master to recover a _fenced_ service
 sub steal_service {
     my ($self, $sid, $current_node, $new_node) = @_;
@@ -468,4 +492,10 @@ sub get_static_node_stats {
     return $self->{hardware}->get_static_node_stats();
 }
 
+sub get_node_version {
+    my ($self, $node) = @_;
+
+    return $self->{hardware}->get_node_version($node);
+}
+
 1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 3a1ebf25..4207ce31 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -343,6 +343,15 @@ sub read_rules_config {
     return $rules;
 }
 
+sub write_rules_config {
+    my ($self, $rules) = @_;
+
+    my $filename = "$self->{statusdir}/rules_config";
+
+    my $data = PVE::HA::Rules->write_config($filename, $rules);
+    PVE::Tools::file_set_contents($filename, $data);
+}
+
 sub read_group_config {
     my ($self) = @_;
 
@@ -353,6 +362,13 @@ sub read_group_config {
     return PVE::HA::Groups->parse_config($filename, $raw);
 }
 
+sub delete_group_config {
+    my ($self) = @_;
+
+    my $filename = "$self->{statusdir}/groups";
+    unlink $filename or die "failed to remove group config: $!\n";
+}
+
 sub read_service_status {
     my ($self, $node) = @_;
 
@@ -932,4 +948,12 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_node_version {
+    my ($self, $node) = @_;
+
+    my $cstatus = $self->read_hardware_status_nolock();
+
+    return $cstatus->{$node}->{version} // "9.0.0~2";
+}
+
 1;
diff --git a/src/test/test-group-migrate1/README 
b/src/test/test-group-migrate1/README
new file mode 100644
index 00000000..7fb2109b
--- /dev/null
+++ b/src/test/test-group-migrate1/README
@@ -0,0 +1,4 @@
+Test whether a partially upgraded cluster, i.e. at least one node has not
+reached the minimum version to understand HA rules, does not fully migrate the
+HA group config. That is, the HA groups config will not be deleted and the
+failback flag is not written to the service config.
diff --git a/src/test/test-group-migrate1/cmdlist 
b/src/test/test-group-migrate1/cmdlist
new file mode 100644
index 00000000..3bfad442
--- /dev/null
+++ b/src/test/test-group-migrate1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"]
+]
diff --git a/src/test/test-group-migrate1/groups 
b/src/test/test-group-migrate1/groups
new file mode 100644
index 00000000..bad746ca
--- /dev/null
+++ b/src/test/test-group-migrate1/groups
@@ -0,0 +1,7 @@
+group: group1
+       nodes node1
+       restricted 1
+
+group: group2
+       nodes node2:2,node3
+       nofailback 1
diff --git a/src/test/test-group-migrate1/hardware_status 
b/src/test/test-group-migrate1/hardware_status
new file mode 100644
index 00000000..f8c6c787
--- /dev/null
+++ b/src/test/test-group-migrate1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "version": "9.1.2" },
+  "node2": { "power": "off", "network": "off", "version": "9.0.0~11" },
+  "node3": { "power": "off", "network": "off", "version": "8.4.1" }
+}
diff --git a/src/test/test-group-migrate1/log.expect 
b/src/test/test-group-migrate1/log.expect
new file mode 100644
index 00000000..fd3dc030
--- /dev/null
+++ b/src/test/test-group-migrate1/log.expect
@@ -0,0 +1,101 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 
'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 
'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 
'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' 
to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' 
to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' 
to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:102
+info     23    node2/lrm: service status vm:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+noti     60    node1/crm: start ha group migration...
+noti     60    node1/crm: ha groups migration: node 'node1' is in state 
'online'
+noti     60    node1/crm: ha groups migration: lrm 'node1' is in state 
'active' and mode 'active'
+noti     60    node1/crm: ha groups migration: node 'node1' has version '9.1.2'
+noti     60    node1/crm: ha groups migration: node 'node2' is in state 
'online'
+noti     60    node1/crm: ha groups migration: lrm 'node2' is in state 
'active' and mode 'active'
+noti     60    node1/crm: ha groups migration: node 'node2' has version 
'9.0.0~11'
+noti     60    node1/crm: ha groups migration: node 'node3' is in state 
'online'
+noti     60    node1/crm: ha groups migration: lrm 'node3' is in state 
'active' and mode 'active'
+noti     60    node1/crm: ha groups migration: node 'node3' has version '8.4.1'
+err      60    node1/crm: abort ha groups migration: node 'node3' needs at 
least pve-manager version '9.0.0'
+err      60    node1/crm: ha groups migration failed
+noti     60    node1/crm: retry ha groups migration in 6 rounds (~ 60 seconds)
+noti    180    node1/crm: start ha group migration...
+noti    180    node1/crm: ha groups migration: node 'node1' is in state 
'online'
+noti    180    node1/crm: ha groups migration: lrm 'node1' is in state 
'active' and mode 'active'
+noti    180    node1/crm: ha groups migration: node 'node1' has version '9.1.2'
+noti    180    node1/crm: ha groups migration: node 'node2' is in state 
'online'
+noti    180    node1/crm: ha groups migration: lrm 'node2' is in state 
'active' and mode 'active'
+noti    180    node1/crm: ha groups migration: node 'node2' has version 
'9.0.0~11'
+noti    180    node1/crm: ha groups migration: node 'node3' is in state 
'online'
+noti    180    node1/crm: ha groups migration: lrm 'node3' is in state 
'active' and mode 'active'
+noti    180    node1/crm: ha groups migration: node 'node3' has version '8.4.1'
+err     180    node1/crm: abort ha groups migration: node 'node3' needs at 
least pve-manager version '9.0.0'
+err     180    node1/crm: ha groups migration failed
+noti    180    node1/crm: retry ha groups migration in 6 rounds (~ 60 seconds)
+noti    300    node1/crm: start ha group migration...
+noti    300    node1/crm: ha groups migration: node 'node1' is in state 
'online'
+noti    300    node1/crm: ha groups migration: lrm 'node1' is in state 
'active' and mode 'active'
+noti    300    node1/crm: ha groups migration: node 'node1' has version '9.1.2'
+noti    300    node1/crm: ha groups migration: node 'node2' is in state 
'online'
+noti    300    node1/crm: ha groups migration: lrm 'node2' is in state 
'active' and mode 'active'
+noti    300    node1/crm: ha groups migration: node 'node2' has version 
'9.0.0~11'
+noti    300    node1/crm: ha groups migration: node 'node3' is in state 
'online'
+noti    300    node1/crm: ha groups migration: lrm 'node3' is in state 
'active' and mode 'active'
+noti    300    node1/crm: ha groups migration: node 'node3' has version '8.4.1'
+err     300    node1/crm: abort ha groups migration: node 'node3' needs at 
least pve-manager version '9.0.0'
+err     300    node1/crm: ha groups migration failed
+noti    300    node1/crm: retry ha groups migration in 6 rounds (~ 60 seconds)
+noti    420    node1/crm: start ha group migration...
+noti    420    node1/crm: ha groups migration: node 'node1' is in state 
'online'
+noti    420    node1/crm: ha groups migration: lrm 'node1' is in state 
'active' and mode 'active'
+noti    420    node1/crm: ha groups migration: node 'node1' has version '9.1.2'
+noti    420    node1/crm: ha groups migration: node 'node2' is in state 
'online'
+noti    420    node1/crm: ha groups migration: lrm 'node2' is in state 
'active' and mode 'active'
+noti    420    node1/crm: ha groups migration: node 'node2' has version 
'9.0.0~11'
+noti    420    node1/crm: ha groups migration: node 'node3' is in state 
'online'
+noti    420    node1/crm: ha groups migration: lrm 'node3' is in state 
'active' and mode 'active'
+noti    420    node1/crm: ha groups migration: node 'node3' has version '8.4.1'
+err     420    node1/crm: abort ha groups migration: node 'node3' needs at 
least pve-manager version '9.0.0'
+err     420    node1/crm: ha groups migration failed
+noti    420    node1/crm: retry ha groups migration in 6 rounds (~ 60 seconds)
+noti    540    node1/crm: start ha group migration...
+noti    540    node1/crm: ha groups migration: node 'node1' is in state 
'online'
+noti    540    node1/crm: ha groups migration: lrm 'node1' is in state 
'active' and mode 'active'
+noti    540    node1/crm: ha groups migration: node 'node1' has version '9.1.2'
+noti    540    node1/crm: ha groups migration: node 'node2' is in state 
'online'
+noti    540    node1/crm: ha groups migration: lrm 'node2' is in state 
'active' and mode 'active'
+noti    540    node1/crm: ha groups migration: node 'node2' has version 
'9.0.0~11'
+noti    540    node1/crm: ha groups migration: node 'node3' is in state 
'online'
+noti    540    node1/crm: ha groups migration: lrm 'node3' is in state 
'active' and mode 'active'
+noti    540    node1/crm: ha groups migration: node 'node3' has version '8.4.1'
+err     540    node1/crm: abort ha groups migration: node 'node3' needs at 
least pve-manager version '9.0.0'
+err     540    node1/crm: ha groups migration failed
+noti    540    node1/crm: retry ha groups migration in 6 rounds (~ 60 seconds)
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-group-migrate1/manager_status 
b/src/test/test-group-migrate1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-group-migrate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-group-migrate1/service_config 
b/src/test/test-group-migrate1/service_config
new file mode 100644
index 00000000..a27551e5
--- /dev/null
+++ b/src/test/test-group-migrate1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started", "group": "group1" },
+    "vm:102": { "node": "node2", "state": "started", "group": "group2" },
+    "vm:103": { "node": "node3", "state": "started", "group": "group2" }
+}
diff --git a/src/test/test-group-migrate2/README 
b/src/test/test-group-migrate2/README
new file mode 100644
index 00000000..0430bf25
--- /dev/null
+++ b/src/test/test-group-migrate2/README
@@ -0,0 +1,3 @@
+Test whether a fully upgraded cluster, i.e. each node has reached the minimum
+version to understand HA rules, correctly migrates the HA group config to the
+HA rules config and deletes the HA groups config.
diff --git a/src/test/test-group-migrate2/cmdlist 
b/src/test/test-group-migrate2/cmdlist
new file mode 100644
index 00000000..3bfad442
--- /dev/null
+++ b/src/test/test-group-migrate2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"]
+]
diff --git a/src/test/test-group-migrate2/groups 
b/src/test/test-group-migrate2/groups
new file mode 100644
index 00000000..bad746ca
--- /dev/null
+++ b/src/test/test-group-migrate2/groups
@@ -0,0 +1,7 @@
+group: group1
+       nodes node1
+       restricted 1
+
+group: group2
+       nodes node2:2,node3
+       nofailback 1
diff --git a/src/test/test-group-migrate2/hardware_status 
b/src/test/test-group-migrate2/hardware_status
new file mode 100644
index 00000000..ec45176b
--- /dev/null
+++ b/src/test/test-group-migrate2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "version": "9.0.0~11" },
+  "node2": { "power": "off", "network": "off", "version": "9.0.1" },
+  "node3": { "power": "off", "network": "off", "version": "9.4.1" }
+}
diff --git a/src/test/test-group-migrate2/log.expect 
b/src/test/test-group-migrate2/log.expect
new file mode 100644
index 00000000..723ac1f7
--- /dev/null
+++ b/src/test/test-group-migrate2/log.expect
@@ -0,0 +1,50 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 
'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 
'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 
'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' 
to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' 
to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' 
to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:102
+info     23    node2/lrm: service status vm:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+noti     60    node1/crm: start ha group migration...
+noti     60    node1/crm: ha groups migration: node 'node1' is in state 
'online'
+noti     60    node1/crm: ha groups migration: lrm 'node1' is in state 
'active' and mode 'active'
+noti     60    node1/crm: ha groups migration: node 'node1' has version 
'9.0.0~11'
+noti     60    node1/crm: ha groups migration: node 'node2' is in state 
'online'
+noti     60    node1/crm: ha groups migration: lrm 'node2' is in state 
'active' and mode 'active'
+noti     60    node1/crm: ha groups migration: node 'node2' has version '9.0.1'
+noti     60    node1/crm: ha groups migration: node 'node3' is in state 
'online'
+noti     60    node1/crm: ha groups migration: lrm 'node3' is in state 
'active' and mode 'active'
+noti     60    node1/crm: ha groups migration: node 'node3' has version '9.4.1'
+noti     60    node1/crm: ha groups migration: migration to rules config 
successful
+noti     60    node1/crm: ha groups migration: migration to resources config 
successful
+noti     60    node1/crm: ha groups migration: group config deletion successful
+noti     60    node1/crm: ha groups migration successful
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-group-migrate2/manager_status 
b/src/test/test-group-migrate2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-group-migrate2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-group-migrate2/service_config 
b/src/test/test-group-migrate2/service_config
new file mode 100644
index 00000000..a27551e5
--- /dev/null
+++ b/src/test/test-group-migrate2/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started", "group": "group1" },
+    "vm:102": { "node": "node2", "state": "started", "group": "group2" },
+    "vm:103": { "node": "node3", "state": "started", "group": "group2" }
+}
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to