When we have a running job for a service which gets removed from
HA it can result in an error. This is normally not problematic if
the worker was already started (=has a PID) else we may trigger a
loop of errors when alrteady "$max_workers" are active and we
remove a service with a queued crm command.

Signed-off-by: Thomas Lamprecht <[email protected]>
---
 src/PVE/HA/LRM.pm | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 1894f3c..217a8ad 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -378,7 +378,16 @@ sub run_workers {
            my $w = $self->{workers}->{$sid};
            my $cd = $sc->{$sid};
            if (!$cd) {
-               $haenv->log('err', "missing resource configuration for '$sid'");
+               # if not already started don't start the worker at all,
+               # as the service was removed from HA management, else warn
+               if (!$w->{pid}) {
+                   delete $self->{workers}->{$sid};
+                   $haenv->log('err', "missing resource configuration for " .
+                               "'$sid' - do not start worker [$w->{state}]");
+               } else {
+                   $haenv->log('err', "orphaned active worker [$w->{state}] 
for" .
+                               " service '$sid' with no resource 
configuration");
+               }
                next;
            }
            if (!$w->{pid}) {
-- 
2.1.4


_______________________________________________
pve-devel mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to