On 4/2/15, 1:09 PM, Jeffrey Nguyen (jeffrngu) wrote:
Hi Anuruddha,

The instances that are in Error state on Openstack Horizon were never removed even after Stratos successfully spawned an instance. It sounds like you might need to enhance jClouds API to return an object with nodeID info for this case. Or perhaps a better solution would be to modify the jClouds API to delete the failed instance if it wasn’t spawned successfully (or make that an option of the API that handles spawning new instance).

Jeffrey,

If jcloud is not returning an nodeID, it should handle cleaning up. It's also reasonable for jcloud to return an object to the failed instance but that's not much use to stratos except for leaving the VM around so that we can see that it failed to come up. I don't know if we want an option to have jcloud try to respawn an instances as this would most likely fail. It's better to have stratos manage the retries.

-Vanson.

Regards,
-Jeffrey

From: Anuruddha Liyanarachchi <[email protected] <mailto:[email protected]>>
Reply-To: "[email protected] <mailto:[email protected]>" 
<[email protected] <mailto:[email protected]>>
Date: Thursday, April 2, 2015 at 4:43 AM
To: "[email protected] <mailto:[email protected]>" <[email protected] 
<mailto:[email protected]>>
Subject: Re: Stratos not properly terminating VMs to fail to startup

Hi Vanson / Jeffery,

As seen in logs, the instance Id is not returned to Stratos (instanceId=null) for the members which went to error state.Therefore Stratos don't have control over the instances in the error state. Hence spawned instances with errors are not being deleted.



On Wed, Apr 1, 2015 at 4:08 AM, Jeffrey Nguyen (jeffrngu) <[email protected] 
<mailto:[email protected]>> wrote:

    Hi Vanson,

    I opened a JIRA to track this issue last week:
    https://issues.apache.org/jira/browse/STRATOS-1293

    -Jeffrey

    On 3/31/15, 3:04 PM, "Vanson Lim (vlim)" <[email protected] 
<mailto:[email protected]>> wrote:

    >Devs,
    >
    >I've simulated the case where openstack fails to bring up a VM (we've
    >seen this before in cases where required resources are not available
    >or there is some IAAS problem/timeout which caused the VM to failure to
    >launch), in this case we cause this failure by specifying the
    >cartridge to have a fixed ip address is not part of the network which the
    >VM attaches to.  The network is defined with a 10.0.0.0/24 
<http://10.0.0.0/24>
    >subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup
    >to fail.
    >
    >The VM start fails and the VM remains in an error state the of the
    >"pendingMemberExpiryTimeout" period set in the autoscaler.xml file.
    >
    >Stratos fails to delete the VM in error state and attempts to start a new
    >VM, which also fails to launch.
    >
    >This presumably repeat itself creating an additional VM in error state
    >during each iteration until we've exhausted all the resources in the
    >system.
    >
    >wso2carbon.log and cartridge definition attached.
    >
    >-Vanson
    >




--
*Thanks and Regards,*
Anuruddha Lanka Liyanarachchi
Software Engineer - WSO2
Mobile : +94 (0) 712762611
Tel      : +94 112 145 345
a <mailto:[email protected]>[email protected] <mailto:[email protected]>

Reply via email to