mlsorensen opened a new pull request #5552: URL: https://github.com/apache/cloudstack/pull/5552
### Description This PR adds a way for KVM agent CommandWrappers to register code to run in case of an Agent disconnect event. When KVM agents lose communication to the management cluster, outstanding work sent to the agent is considered failed, however the work may continue on and run to completion. This is not a big issue for things like live migration where a VM will eventually move to another host and then CloudStack will sync up with the new state, but it becomes more important for long running tasks like volume copy, where we may want to attempt to stop a copy or do some cleanup. This change doesn't introduce much in the way of behavioral changes, it just adds the framework for CommandWrappers to utilize. It does add simple "example" that provides cancellation of VM live migration upon agent disconnect. The functionality is simple, we provide LibvirtComputingResource with a list to store DisconnectHooks, and then it implements the `disconnected()` method to call these hooks in the event of a loss of connectivity. These DisconnectHooks are basically just Java Thread types with a little extra metadata, which are executed in parallel with a best-effort timeout. The CommandWrappers are responsible for providing the implementation for their own DisconnectHooks (one or many), registering them prior to doing their work, and un-registering them when work is complete. The implementer of a DisconnectHook is responsible for dealing with any race conditions - in the live migration example it is understood that it's possible for the hook to be called immediately after live migration has succeeded but before the CommandWrapper can de-register the hook. Therefore cancellation of live migration is treated as best-effort and any errors are logged and ignored. ### Alternatives Considered There are probably better designs for this - perhaps the agent could queue results and the management server could hold onto commands and wait for reconnect to get the results. However, that would be a larger effort, and we would still need to handle agent service stop. This DisconnectHook implementation also works for stopping of the KVM agent service and gives CommandWrappers a way to respond to that. The DisconnectHooks themselves could be Runnables instead of Threads. I chose Threads simply because it's more straightforward to run them in parallel, however I'm open to suggestions and any changes requested as well. ### Types of changes - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [X] New feature (non-breaking change which adds functionality) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] Enhancement (improves an existing feature and functionality) - [ ] Cleanup (Code refactoring and cleanup, that may add test cases) ### Feature/Enhancement Scale or Bug Severity #### Feature/Enhancement Scale - [ ] Major - [X] Minor ### How Has This Been Tested? This is a feature I've been using in production for years, and would like to fold it back into upstream -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
