[jira] [Created] (STRATOS-1282) Stratos4.1.0 - error cleaning up VMs (that have floatingip) terminated through Openstack horizon

Martin Eppel (JIRA) Mon, 23 Mar 2015 12:58:05 -0700

Martin Eppel created STRATOS-1282:
-------------------------------------

             Summary: Stratos4.1.0 - error cleaning up VMs (that have 
floatingip) terminated through Openstack horizon
                 Key: STRATOS-1282
                 URL: https://issues.apache.org/jira/browse/STRATOS-1282
             Project: Stratos
          Issue Type: Bug
          Components: Cloud Controller
    Affects Versions: 4.1.0 Beta
            Reporter: Martin Eppel
            Priority: Blocker



On 3/23/15, 6:11 AM, Udara Liyanage wrote:
Hi, 

I could reproduce this in Openstack. The region and image id of the 
iaasProvider is null at the time of IP releasing. When I set the region in 
cloud-controller.xml (which is not a solution,  just for testing) it works 
without the issue.

[2015-03-23 15:25:23,067]  INFO 
{org.apache.stratos.cloud.controller.iaases.JcloudsIaas} -  Member terminated: 
[member-id] 
single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06
[2015-03-23 15:25:23,076]  INFO 
{org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
 -  Publishing member terminated event: [service-name] php [cluster-id] 
single-cartridge-app.my-php.php.domain [cluster-instance-id] 
single-cartridge-app-1 [member-id] 
single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06 
[network-partition-id] network-partition-1 [partition-id] partition-1 
[group-id] null
[2015-03-23 15:25:23,084]  INFO {org.apache.



Udara,

Thanks for looking at this.

I've confirmed that adding the following to the cloud-controller iaasProvider 
also seems to cover up the problem,  I agree, clearly not a solution.


@@ -13,4 +13,5 @@
         <property name="openstack.networking.provider" value="nova" />
        <property name="X" value="x" />
        <property name="Y" value="y" />
+       <property name="region" value="RegionOne" />
 </iaasProvider>

We'll fill a bug to track this.

There's also the matter that after stratos detects that the VM is inactive, (as 
shown in log snippet below at 18.57:51),  the VM continues to be reported as 
"ACTIVE" in the topology 
events until it's terminated at 18:59:05.    Is there logic in place that will 
return this VM to service if the VM is detected before the CEP publishes member 
fault event?



TID: [0] [STRATOS] [2015-03-23 18:57:51,932]  WARN 
{org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
 -  Sending application instance inactive for [Application] cisco-sample-vm 
[ApplicationInstance] cisco-sample-vm-1
TID: [0] [STRATOS] [2015-03-23 18:57:51,941]  INFO 
{org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} - 
 Publishing application inactivated event: [application] cisco-sample-vm 
[instance] cisco-sample-vm-1
TID: [0] [STRATOS] [2015-03-23 18:58:51,883]  INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Faulty 
member detected [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
 with [last time-stamp] 1427136970708 [time-out] 60000 milliseconds
TID: [0] [STRATOS] [2015-03-23 18:58:51,884]  INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing 
member fault event for [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
.....

TID: [0] [STRATOS] [2015-03-23 18:59:05,887]  INFO 
{org.apache.stratos.common.client.CloudControllerServiceClient} -  Terminating 
instance via cloud controller: [member] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965


-Vanson


On Mon, Mar 23, 2015 at 11:07 AM, Udara Liyanage <[email protected]> wrote:
Hi,  

I will have a look.

On Mon, Mar 23, 2015 at 3:38 AM, Vanson Lim <[email protected]> wrote:
Devs,

We are continuing to work on testing the latest stratos 4.1.0 codebase.

This problem is seen only for  VM that have floating ip.   I've tested with the 
non floating ip case and don't see issues.

The error return code from jcloud api call is preventing stratos from cleaning 
up its state.

Stratos seems to forever throw tracebacks as it repeatedly tries to terminate 
the faulty instance.

Meanwhile, the "down" VM is still being reported as active in the topology 
events, which seems wrong.  If stratos detects that the VM is faulty, shouldn't 
it report it immediately in the topology events?  Stratos currently has the 
following states define and none of them seem to be appropriate.
Created
Initialized
Starting
Active
In_Maintenance
ReadyToShutdown
Suspended
Terminated

Do we need new state TIMED-OUT state that stratos reports for VM as stratos 
works to terminate it?

How to reproduce this issue:

1) Start a sample cartridge instance that has a floating ip.

2) wait for sample cartridge to become active
3) terminate sample vm via openstack horizon interface, and wait for stratos to 
detect VM the error.


Testing using a version of stratos built off the following commit id:
commit 01dd9e491ad3acf7cc4e0f2895aaba336b82539d
Author: R-Rajkumar <[email protected]>
Date:   Fri Mar 20 19:51:06 2015 +0530

    fixing an NPE in AS

I've attached the full wso2carbon.log  Included below is the observed traceback:

-Vanson


TID: [0] [STRATOS] [2015-03-22 20:53:21,554]  INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing 
member fault event for [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
TID: [0] [STRATOS] [2015-03-22 20:54:06,386]  INFO 
{org.apache.stratos.common.client.CloudControllerServiceClient} -  Terminating 
instance via cloud controller: [member] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
TID: [0] [STRATOS] [2015-03-22 20:54:06,399]  INFO 
{org.apache.stratos.cloud.controller.iaases.JcloudsIaas} -  Starting to 
terminate member: [cartridge-type] cisco-sample-vm [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
TID: [0] [STRATOS] [2015-03-22 20:54:06,450] ERROR 
{org.apache.stratos.cloud.controller.services.impl.InstanceTerminator} -  
Instance termination failed! MemberContext [applicationId=cisco-sample-vm, 
cartridgeType=cisco-sample-vm, 
clusterId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain, 
memberId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd,
 instanceId=RegionOne/83751110-4e5b-4aef-b6a3-c291c9eaad3d, partition=Partition 
[id=whole-region, description=null, isPublic=false, provider=Core, 
properties=Properties [properties=[Property [name=region, value=RegionOne]]]], 
defaultPrivateIP=172.16.2.17, defaultPublicIP=10.0.0.102, 
allocatedIPs=[10.0.0.102], publicIPs=[10.0.0.102], privateIPs=[172.16.2.17], 
initTime=1427057106433, lbClusterId=null, networkPartitionId=RegionOne, 
kubernetesPodId=null, kubernetesPodLabel=null, loadBalancingIPType=Private, 
instanceMetadata=org.apache.stratos.cloud.controller.domain.InstanceMetadata@5b176e44,
 properties=Properties [properties=[Property [name=PRIMARY, value=false], 
Property [name=MIN_COUNT, value=1]]]]
java.lang.NullPointerException: arg[0] in 
{invocation=org.jclouds.openstack.nova.v2_0.NovaApi.public abstract 
com.google.common.base.Optional 
org.jclouds.openstack.nova.v2_0.NovaApi.getFloatingIPExtensionForZone(java.lang.String)[null],result={annotationParser={caller=NovaApi.getFloatingIPExtensionForZone[null]}}}
        at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253)
        at 
org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:67)
        at 
org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:43)
        at 
org.jclouds.rest.internal.DelegatesToInvocationFunction.propagateContextToDelegate(DelegatesToInvocationFunction.java:205)
        at 
org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:154)
        at 
org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
        at com.sun.proxy.$Proxy119.getFloatingIPExtensionForZone(Unknown Source)
        at 
org.apache.stratos.cloud.controller.iaases.openstack.networking.NovaNetworkingApi.releaseAddress(NovaNetworkingApi.java:239)
        at 
org.apache.stratos.cloud.controller.iaases.openstack.OpenstackIaas.releaseAddress(OpenstackIaas.java:239)
        at 
org.apache.stratos.cloud.controller.iaases.JcloudsIaas.destroyNode(JcloudsIaas.java:334)
        at 
org.apache.stratos.cloud.controller.iaases.JcloudsIaas.terminateInstance(JcloudsIaas.java:314)
        at 
org.apache.stratos.cloud.controller.services.impl.InstanceTerminator.run(InstanceTerminator.java:56)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-03-22 20:54:21,563]  INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Faulty 
member detected [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
 with [last time-stamp] 1427057336960 [time-out] 60000 milliseconds
TID: [0] [STRATOS] [2015-03-22 20:54:21,563]  INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing 
member fault event for [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd


-- 

Udara Liyanage 
Software Engineer
WSO2, Inc.: http://wso2.com 
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897


-- 

Udara Liyanage 
Software Engineer
WSO2, Inc.: http://wso2.com 
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STRATOS-1282) Stratos4.1.0 - error cleaning up VMs (that have floatingip) terminated through Openstack horizon

Reply via email to