[ https://issues.apache.org/jira/browse/STRATOS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Udara Liyanage resolved STRATOS-1282. ------------------------------------- Resolution: Fixed Resolved with 98d9cade122e302665450b437593cee7be28c563 > Stratos4.1.0 - error cleaning up VMs (that have floatingip) terminated > through Openstack horizon > ------------------------------------------------------------------------------------------------ > > Key: STRATOS-1282 > URL: https://issues.apache.org/jira/browse/STRATOS-1282 > Project: Stratos > Issue Type: Bug > Components: Cloud Controller > Affects Versions: 4.1.0 Beta > Reporter: Martin Eppel > Priority: Blocker > > On 3/23/15, 6:11 AM, Udara Liyanage wrote: > Hi, > I could reproduce this in Openstack. The region and image id of the > iaasProvider is null at the time of IP releasing. When I set the region in > cloud-controller.xml (which is not a solution, just for testing) it works > without the issue. > [2015-03-23 15:25:23,067] INFO > {org.apache.stratos.cloud.controller.iaases.JcloudsIaas} - Member > terminated: [member-id] > single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06 > [2015-03-23 15:25:23,076] INFO > {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} > - Publishing member terminated event: [service-name] php [cluster-id] > single-cartridge-app.my-php.php.domain [cluster-instance-id] > single-cartridge-app-1 [member-id] > single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06 > [network-partition-id] network-partition-1 [partition-id] partition-1 > [group-id] null > [2015-03-23 15:25:23,084] INFO {org.apache. > Udara, > Thanks for looking at this. > I've confirmed that adding the following to the cloud-controller iaasProvider > also seems to cover up the problem, I agree, clearly not a solution. > @@ -13,4 +13,5 @@ > <property name="openstack.networking.provider" value="nova" /> > <property name="X" value="x" /> > <property name="Y" value="y" /> > + <property name="region" value="RegionOne" /> > </iaasProvider> > We'll fill a bug to track this. > There's also the matter that after stratos detects that the VM is inactive, > (as shown in log snippet below at 18.57:51), the VM continues to be reported > as "ACTIVE" in the topology > events until it's terminated at 18:59:05. Is there logic in place that > will return this VM to service if the VM is detected before the CEP publishes > member fault event? > TID: [0] [STRATOS] [2015-03-23 18:57:51,932] WARN > {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} > - Sending application instance inactive for [Application] cisco-sample-vm > [ApplicationInstance] cisco-sample-vm-1 > TID: [0] [STRATOS] [2015-03-23 18:57:51,941] INFO > {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} > - Publishing application inactivated event: [application] cisco-sample-vm > [instance] cisco-sample-vm-1 > TID: [0] [STRATOS] [2015-03-23 18:58:51,883] INFO > {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Faulty > member detected [member-id] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965 > with [last time-stamp] 1427136970708 [time-out] 60000 milliseconds > TID: [0] [STRATOS] [2015-03-23 18:58:51,884] INFO > {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Publishing > member fault event for [member-id] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965 > ..... > TID: [0] [STRATOS] [2015-03-23 18:59:05,887] INFO > {org.apache.stratos.common.client.CloudControllerServiceClient} - > Terminating instance via cloud controller: [member] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965 > -Vanson > On Mon, Mar 23, 2015 at 11:07 AM, Udara Liyanage <ud...@wso2.com> wrote: > Hi, > I will have a look. > On Mon, Mar 23, 2015 at 3:38 AM, Vanson Lim <v...@cisco.com> wrote: > Devs, > We are continuing to work on testing the latest stratos 4.1.0 codebase. > This problem is seen only for VM that have floating ip. I've tested with > the non floating ip case and don't see issues. > The error return code from jcloud api call is preventing stratos from > cleaning up its state. > Stratos seems to forever throw tracebacks as it repeatedly tries to terminate > the faulty instance. > Meanwhile, the "down" VM is still being reported as active in the topology > events, which seems wrong. If stratos detects that the VM is faulty, > shouldn't it report it immediately in the topology events? Stratos currently > has the following states define and none of them seem to be appropriate. > Created > Initialized > Starting > Active > In_Maintenance > ReadyToShutdown > Suspended > Terminated > Do we need new state TIMED-OUT state that stratos reports for VM as stratos > works to terminate it? > How to reproduce this issue: > 1) Start a sample cartridge instance that has a floating ip. > 2) wait for sample cartridge to become active > 3) terminate sample vm via openstack horizon interface, and wait for stratos > to detect VM the error. > Testing using a version of stratos built off the following commit id: > commit 01dd9e491ad3acf7cc4e0f2895aaba336b82539d > Author: R-Rajkumar <rraju1...@gmail.com> > Date: Fri Mar 20 19:51:06 2015 +0530 > fixing an NPE in AS > I've attached the full wso2carbon.log Included below is the observed > traceback: > -Vanson > TID: [0] [STRATOS] [2015-03-22 20:53:21,554] INFO > {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Publishing > member fault event for [member-id] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd > TID: [0] [STRATOS] [2015-03-22 20:54:06,386] INFO > {org.apache.stratos.common.client.CloudControllerServiceClient} - > Terminating instance via cloud controller: [member] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd > TID: [0] [STRATOS] [2015-03-22 20:54:06,399] INFO > {org.apache.stratos.cloud.controller.iaases.JcloudsIaas} - Starting to > terminate member: [cartridge-type] cisco-sample-vm [member-id] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd > TID: [0] [STRATOS] [2015-03-22 20:54:06,450] ERROR > {org.apache.stratos.cloud.controller.services.impl.InstanceTerminator} - > Instance termination failed! MemberContext [applicationId=cisco-sample-vm, > cartridgeType=cisco-sample-vm, > clusterId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain, > memberId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd, > instanceId=RegionOne/83751110-4e5b-4aef-b6a3-c291c9eaad3d, > partition=Partition [id=whole-region, description=null, isPublic=false, > provider=Core, properties=Properties [properties=[Property [name=region, > value=RegionOne]]]], defaultPrivateIP=172.16.2.17, > defaultPublicIP=10.0.0.102, allocatedIPs=[10.0.0.102], > publicIPs=[10.0.0.102], privateIPs=[172.16.2.17], initTime=1427057106433, > lbClusterId=null, networkPartitionId=RegionOne, kubernetesPodId=null, > kubernetesPodLabel=null, loadBalancingIPType=Private, > instanceMetadata=org.apache.stratos.cloud.controller.domain.InstanceMetadata@5b176e44, > properties=Properties [properties=[Property [name=PRIMARY, value=false], > Property [name=MIN_COUNT, value=1]]]] > java.lang.NullPointerException: arg[0] in > {invocation=org.jclouds.openstack.nova.v2_0.NovaApi.public abstract > com.google.common.base.Optional > org.jclouds.openstack.nova.v2_0.NovaApi.getFloatingIPExtensionForZone(java.lang.String)[null],result={annotationParser={caller=NovaApi.getFloatingIPExtensionForZone[null]}}} > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253) > at > org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:67) > at > org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:43) > at > org.jclouds.rest.internal.DelegatesToInvocationFunction.propagateContextToDelegate(DelegatesToInvocationFunction.java:205) > at > org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:154) > at > org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123) > at com.sun.proxy.$Proxy119.getFloatingIPExtensionForZone(Unknown > Source) > at > org.apache.stratos.cloud.controller.iaases.openstack.networking.NovaNetworkingApi.releaseAddress(NovaNetworkingApi.java:239) > at > org.apache.stratos.cloud.controller.iaases.openstack.OpenstackIaas.releaseAddress(OpenstackIaas.java:239) > at > org.apache.stratos.cloud.controller.iaases.JcloudsIaas.destroyNode(JcloudsIaas.java:334) > at > org.apache.stratos.cloud.controller.iaases.JcloudsIaas.terminateInstance(JcloudsIaas.java:314) > at > org.apache.stratos.cloud.controller.services.impl.InstanceTerminator.run(InstanceTerminator.java:56) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > TID: [0] [STRATOS] [2015-03-22 20:54:21,563] INFO > {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Faulty > member detected [member-id] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd > with [last time-stamp] 1427057336960 [time-out] 60000 milliseconds > TID: [0] [STRATOS] [2015-03-22 20:54:21,563] INFO > {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Publishing > member fault event for [member-id] > cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd > -- > Udara Liyanage > Software Engineer > WSO2, Inc.: http://wso2.com > lean. enterprise. middleware > web: http://udaraliyanage.wordpress.com > phone: +94 71 443 6897 > -- > Udara Liyanage > Software Engineer > WSO2, Inc.: http://wso2.com > lean. enterprise. middleware > web: http://udaraliyanage.wordpress.com > phone: +94 71 443 6897 -- This message was sent by Atlassian JIRA (v6.3.4#6332)