Min Chen created CLOUDSTACK-7778:
------------------------------------

             Summary: Start VM checkWorkItem loop should also check VM DB state 
before going into idle waiting to exit faster.
                 Key: CLOUDSTACK-7778
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7778
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.0.0
            Reporter: Min Chen
             Fix For: 4.5.0


During VM deployment, it may involve starting VR. In the meantime, our HA 
process may also try to start the same VR for example due to host disconnect. 
Pre-4.3 release, we didn't serialize these two VR start operations, and tried 
to use VM state transition failure to tell if there is another concurrent 
operation. In case of concurrent operation, we are not fail the VM deployment 
job immediately. Instead, we have retry logic to keep checking op_it_work table 
to see if some other outstanding items have been working on the same VR. If 
there is any issue with some dangling op_it_work item, this retry will take 
more than one hour and then fail even though VR may have already been started a 
while back by HA process. Although due to recent VMsync framework change, this 
concurrent VM operations become less, it is still better to check current VM 
state in the while loop of check op_it_work items to get early exit instead of 
purely relying on op_it_work table being updated properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to