When we have a running job for a service which gets removed from HA it can result in an error. This is normally not problematic if the worker was already started (=has a PID) else we may trigger a loop of errors when alrteady "$max_workers" are active and we remove a service with a queued crm command.
Signed-off-by: Thomas Lamprecht <[email protected]> --- src/PVE/HA/LRM.pm | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm index 1894f3c..217a8ad 100644 --- a/src/PVE/HA/LRM.pm +++ b/src/PVE/HA/LRM.pm @@ -378,7 +378,16 @@ sub run_workers { my $w = $self->{workers}->{$sid}; my $cd = $sc->{$sid}; if (!$cd) { - $haenv->log('err', "missing resource configuration for '$sid'"); + # if not already started don't start the worker at all, + # as the service was removed from HA management, else warn + if (!$w->{pid}) { + delete $self->{workers}->{$sid}; + $haenv->log('err', "missing resource configuration for " . + "'$sid' - do not start worker [$w->{state}]"); + } else { + $haenv->log('err', "orphaned active worker [$w->{state}] for" . + " service '$sid' with no resource configuration"); + } next; } if (!$w->{pid}) { -- 2.1.4 _______________________________________________ pve-devel mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
