mlsorensen opened a new pull request #5552:
URL: https://github.com/apache/cloudstack/pull/5552


   ### Description
   
   This PR adds a way for KVM agent CommandWrappers to register code to run in 
case of an Agent disconnect event.  When KVM agents lose communication to the 
management cluster, outstanding work sent to the agent is considered failed, 
however the work may continue on and run to completion. This is not a big issue 
for things like live migration where a VM will eventually move to another host 
and then CloudStack will sync up with the new state, but it becomes more 
important for long running tasks like volume copy, where we may want to attempt 
to stop a copy or do some cleanup.
   
   This change doesn't introduce much in the way of behavioral changes, it just 
adds the framework for CommandWrappers to utilize.  It does add simple 
"example" that provides cancellation of VM live migration upon agent disconnect.
   
   The functionality is simple, we provide LibvirtComputingResource with a list 
to store DisconnectHooks, and then it implements the `disconnected()` method to 
call these hooks in the event of a loss of connectivity. These DisconnectHooks 
are basically just Java Thread types with a little extra metadata, which are 
executed in parallel with a best-effort timeout. 
   
   The CommandWrappers are responsible for providing the implementation for 
their own DisconnectHooks (one or many), registering them prior to doing their 
work, and un-registering them when work is complete. The implementer of a 
DisconnectHook is responsible for dealing with any race conditions - in the 
live migration example it is understood that it's possible for the hook to be 
called immediately after live migration has succeeded but before the 
CommandWrapper can de-register the hook. Therefore cancellation of live 
migration is treated as best-effort and any errors are logged and ignored.
   
   ### Alternatives Considered
   There are probably better designs for this - perhaps the agent could queue 
results and the management server could hold onto commands and wait for 
reconnect to get the results. However, that would be a larger effort, and we 
would still need to handle agent service stop. This DisconnectHook 
implementation also works for stopping of the KVM agent service and gives 
CommandWrappers a way to respond to that.
   
   The DisconnectHooks themselves could be Runnables instead of Threads. I 
chose Threads simply because it's more straightforward to run them in parallel, 
however I'm open to suggestions and any changes requested as well.
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [X] New feature (non-breaking change which adds functionality)
   - [ ] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [X] Minor
   
   
   ### How Has This Been Tested?
   This is a feature I've been using in production for years, and would like to 
fold it back into upstream 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to