Arik Hadas has posted comments on this change.

Change subject: core: add job that runs HA VMs which failed to run
......................................................................


Patch Set 2:

(1 comment)

Allon, that was my intention in the beginning - I used the isWait property of 
locks to wait for the VM lock to be released. The problem with this approach is 
that on mass invocations of run commands we might block all the threads in the 
thread pool.
Think of the following scenario: we're in a large scale system where one host 
is running few hundred HA VMs and we're doing live migration of disks that is 
used by all the VMs. in the middle of the live storage migration the host 
crashes, so we're trying to automatically start all those VMs, each thread that 
run VM is going to be blocked until the live storage migration will end - and 
we might end up with no threads left in the pool.
So I accepted a suggestion to use periodic job over block threads. note that I 
have few optimizations in mind to do next (like trigger the job when live 
snapshot ends for example).

....................................................
File 
backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/AutoStartVmsRunner.java
Line 50:         if (!result.getSucceeded()) {
Line 51:             final AuditLogableBase event = new AuditLogableBase();
Line 52:             event.setVmId(vmId);
Line 53:             AuditLogDirector.log(event, 
AuditLogType.HA_VM_RESTART_FAILED);
Line 54:             // should insert to autoStartVmsToRun again?
note that the following scenario is handled well without inserting the VM to 
the queue in this point: we try to restart HA VM and it fails because the VM is 
locked so the VM is added to the queue and we try to run it in the next 
iteration, but the lock is still not released, so we cannot acquire it again, 
RunVmCommand will add it to the queue using addVmToRun method again, we'll try 
to run it on the next iteration and so on until the lock is released.

I was considering whether to add the VM to the queue in this point, since the 
run command might fail from different reason which is not related to locks. I 
guess that in the general we want to add the VM to the queue everytime the run 
command fails for HA VM, but if there are cases in which the run command will 
keep failing for sure (let's say the VM was edited and it now has settings it 
cannot start with) - we probably don't want to add the VM to the queue again in 
those cases. so it's kind of TODO for me to check, but it's not affecting the 
solution for the reported bug.
Line 55:         }
Line 56:     }


-- 
To view, visit http://gerrit.ovirt.org/18815
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3d563d05efc6dae40f0de8e7f4b7e47aa84bd787
Gerrit-PatchSet: 2
Gerrit-Project: ovirt-engine
Gerrit-Branch: master
Gerrit-Owner: Arik Hadas <[email protected]>
Gerrit-Reviewer: Allon Mureinik <[email protected]>
Gerrit-Reviewer: Arik Hadas <[email protected]>
Gerrit-Reviewer: Barak Azulay <[email protected]>
Gerrit-Reviewer: Michal Skrivanek <[email protected]>
Gerrit-Reviewer: Omer Frenkel <[email protected]>
Gerrit-Reviewer: Yair Zaslavsky <[email protected]>
Gerrit-Reviewer: oVirt Jenkins CI Server
Gerrit-HasComments: Yes
_______________________________________________
Engine-patches mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/engine-patches

Reply via email to