Github user rafaelweingartner commented on the pull request:

    https://github.com/apache/cloudstack/pull/943#issuecomment-160966699
  
    @DaanHoogland, 
    I checked the logs you sent me.
    
    The VMs were marked as destroyed, but it seems that they have not been 
“destroyed” or removed/expunged yet. I looked at the code, and the only way 
that they are removed from the response of the list VMs methods is after the 
expunge thread execution that fills out the “removed” field in the database.
     
    I also looked at the code of the integration tests, my perl is a little 
rusty, but I noticed that the code waits a few cycles (2) of the expunge delay 
to execute; therefore, there is no way to guarantee that the expunge thread has 
already been executed and the VM has passed the expunge delay and has been 
removed.
    
    If I recall properly, there are mainly three (3) variables in play, the 
time that the VM was destroyed, the expunge delay per se and the expunge 
interval (the interval of the expunge thread execution). 
    
    So, if the expunge thread runs, but the VM has been destroyed too recently 
and has not passed the expunge delay, it will not be marked as destroyed. That 
is what seems to have happened there. I know some people may come and say, 
“the test worked a lot of time”. And yes it can work, but it depends if you 
are luck or not. I personally do not like tests that may present this kind of 
behavior.  Moreover, the expunge interval depends on the time that the MS has 
been started.
    
    I will illustrate it with an example that we have seen happening here.
    Giving that our expunge interval is 24 hours, and our expunge delay is also 
24 hours. Suppose the MS server was started and got up and running at some day 
at 23:59 and that the first time the expunge thread runs is 00:00. If we are 
unlucky and we destroy the VM at 00:01, next day (second run of the expunge 
thread) when the thread runs at 00:00, the VM will not be removed and will 
continue appearing, since the expunge delay that cotrols the VMs removal is 24 
hours and the VM has been destroyed for 23:59 (almost there, but not yet). 
Therefore, the VM will only be removed in the third execution of the expunge 
thread.
    
    Having said that, I have the following questions, what do we want with that 
test? We want to test the expunge thread? Or just test If the destroyed VM is 
not listed? If we want the second, why don’t we force the expunging (using 
expungeVirtualMachine command) instead of waiting the expunge thread?
    
    If the idea is to let the test as it is, to avoid the problem I have just 
described, we could just change a "bit" of the file test_vm_life_cycle the 
multiplier, in line 632 from “4” to “6” . That change would guarantee 
to wait till the third execution of the expunge thread, and avoid cases as the 
one described.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to