[ 
https://issues.apache.org/jira/browse/STRATOS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384279#comment-14384279
 ] 

Vanson Lim commented on STRATOS-1282:
-------------------------------------


Udara,

Thanks for the fix.  I've verified on my setup that I no longer see the 
traceback, and this test cases seems to be
behaving properly now.

I took a look at the diffs associated with this commit and have some minor 
comments.

-Vanson





> Stratos4.1.0 - error cleaning up VMs (that have floatingip) terminated 
> through Openstack horizon
> ------------------------------------------------------------------------------------------------
>
>                 Key: STRATOS-1282
>                 URL: https://issues.apache.org/jira/browse/STRATOS-1282
>             Project: Stratos
>          Issue Type: Bug
>          Components: Cloud Controller
>    Affects Versions: 4.1.0 Beta
>            Reporter: Martin Eppel
>            Priority: Blocker
>
> On 3/23/15, 6:11 AM, Udara Liyanage wrote:
> Hi, 
> I could reproduce this in Openstack. The region and image id of the 
> iaasProvider is null at the time of IP releasing. When I set the region in 
> cloud-controller.xml (which is not a solution,  just for testing) it works 
> without the issue.
> [2015-03-23 15:25:23,067]  INFO 
> {org.apache.stratos.cloud.controller.iaases.JcloudsIaas} -  Member 
> terminated: [member-id] 
> single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06
> [2015-03-23 15:25:23,076]  INFO 
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>  -  Publishing member terminated event: [service-name] php [cluster-id] 
> single-cartridge-app.my-php.php.domain [cluster-instance-id] 
> single-cartridge-app-1 [member-id] 
> single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06 
> [network-partition-id] network-partition-1 [partition-id] partition-1 
> [group-id] null
> [2015-03-23 15:25:23,084]  INFO {org.apache.
> Udara,
> Thanks for looking at this.
> I've confirmed that adding the following to the cloud-controller iaasProvider 
> also seems to cover up the problem,  I agree, clearly not a solution.
> @@ -13,4 +13,5 @@
>          <property name="openstack.networking.provider" value="nova" />
>         <property name="X" value="x" />
>         <property name="Y" value="y" />
> +       <property name="region" value="RegionOne" />
>  </iaasProvider>
> We'll fill a bug to track this.
> There's also the matter that after stratos detects that the VM is inactive, 
> (as shown in log snippet below at 18.57:51),  the VM continues to be reported 
> as "ACTIVE" in the topology 
> events until it's terminated at 18:59:05.    Is there logic in place that 
> will return this VM to service if the VM is detected before the CEP publishes 
> member fault event?
> TID: [0] [STRATOS] [2015-03-23 18:57:51,932]  WARN 
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
>  -  Sending application instance inactive for [Application] cisco-sample-vm 
> [ApplicationInstance] cisco-sample-vm-1
> TID: [0] [STRATOS] [2015-03-23 18:57:51,941]  INFO 
> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} 
> -  Publishing application inactivated event: [application] cisco-sample-vm 
> [instance] cisco-sample-vm-1
> TID: [0] [STRATOS] [2015-03-23 18:58:51,883]  INFO 
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Faulty 
> member detected [member-id] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
>  with [last time-stamp] 1427136970708 [time-out] 60000 milliseconds
> TID: [0] [STRATOS] [2015-03-23 18:58:51,884]  INFO 
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing 
> member fault event for [member-id] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
> .....
> TID: [0] [STRATOS] [2015-03-23 18:59:05,887]  INFO 
> {org.apache.stratos.common.client.CloudControllerServiceClient} -  
> Terminating instance via cloud controller: [member] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
> -Vanson
> On Mon, Mar 23, 2015 at 11:07 AM, Udara Liyanage <ud...@wso2.com> wrote:
> Hi,  
> I will have a look.
> On Mon, Mar 23, 2015 at 3:38 AM, Vanson Lim <v...@cisco.com> wrote:
> Devs,
> We are continuing to work on testing the latest stratos 4.1.0 codebase.
> This problem is seen only for  VM that have floating ip.   I've tested with 
> the non floating ip case and don't see issues.
> The error return code from jcloud api call is preventing stratos from 
> cleaning up its state.
> Stratos seems to forever throw tracebacks as it repeatedly tries to terminate 
> the faulty instance.
> Meanwhile, the "down" VM is still being reported as active in the topology 
> events, which seems wrong.  If stratos detects that the VM is faulty, 
> shouldn't it report it immediately in the topology events?  Stratos currently 
> has the following states define and none of them seem to be appropriate.
> Created
> Initialized
> Starting
> Active
> In_Maintenance
> ReadyToShutdown
> Suspended
> Terminated
> Do we need new state TIMED-OUT state that stratos reports for VM as stratos 
> works to terminate it?
> How to reproduce this issue:
> 1) Start a sample cartridge instance that has a floating ip.
> 2) wait for sample cartridge to become active
> 3) terminate sample vm via openstack horizon interface, and wait for stratos 
> to detect VM the error.
> Testing using a version of stratos built off the following commit id:
> commit 01dd9e491ad3acf7cc4e0f2895aaba336b82539d
> Author: R-Rajkumar <rraju1...@gmail.com>
> Date:   Fri Mar 20 19:51:06 2015 +0530
>     fixing an NPE in AS
> I've attached the full wso2carbon.log  Included below is the observed 
> traceback:
> -Vanson
> TID: [0] [STRATOS] [2015-03-22 20:53:21,554]  INFO 
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing 
> member fault event for [member-id] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,386]  INFO 
> {org.apache.stratos.common.client.CloudControllerServiceClient} -  
> Terminating instance via cloud controller: [member] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,399]  INFO 
> {org.apache.stratos.cloud.controller.iaases.JcloudsIaas} -  Starting to 
> terminate member: [cartridge-type] cisco-sample-vm [member-id] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,450] ERROR 
> {org.apache.stratos.cloud.controller.services.impl.InstanceTerminator} -  
> Instance termination failed! MemberContext [applicationId=cisco-sample-vm, 
> cartridgeType=cisco-sample-vm, 
> clusterId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain, 
> memberId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd,
>  instanceId=RegionOne/83751110-4e5b-4aef-b6a3-c291c9eaad3d, 
> partition=Partition [id=whole-region, description=null, isPublic=false, 
> provider=Core, properties=Properties [properties=[Property [name=region, 
> value=RegionOne]]]], defaultPrivateIP=172.16.2.17, 
> defaultPublicIP=10.0.0.102, allocatedIPs=[10.0.0.102], 
> publicIPs=[10.0.0.102], privateIPs=[172.16.2.17], initTime=1427057106433, 
> lbClusterId=null, networkPartitionId=RegionOne, kubernetesPodId=null, 
> kubernetesPodLabel=null, loadBalancingIPType=Private, 
> instanceMetadata=org.apache.stratos.cloud.controller.domain.InstanceMetadata@5b176e44,
>  properties=Properties [properties=[Property [name=PRIMARY, value=false], 
> Property [name=MIN_COUNT, value=1]]]]
> java.lang.NullPointerException: arg[0] in 
> {invocation=org.jclouds.openstack.nova.v2_0.NovaApi.public abstract 
> com.google.common.base.Optional 
> org.jclouds.openstack.nova.v2_0.NovaApi.getFloatingIPExtensionForZone(java.lang.String)[null],result={annotationParser={caller=NovaApi.getFloatingIPExtensionForZone[null]}}}
>         at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253)
>         at 
> org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:67)
>         at 
> org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:43)
>         at 
> org.jclouds.rest.internal.DelegatesToInvocationFunction.propagateContextToDelegate(DelegatesToInvocationFunction.java:205)
>         at 
> org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:154)
>         at 
> org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
>         at com.sun.proxy.$Proxy119.getFloatingIPExtensionForZone(Unknown 
> Source)
>         at 
> org.apache.stratos.cloud.controller.iaases.openstack.networking.NovaNetworkingApi.releaseAddress(NovaNetworkingApi.java:239)
>         at 
> org.apache.stratos.cloud.controller.iaases.openstack.OpenstackIaas.releaseAddress(OpenstackIaas.java:239)
>         at 
> org.apache.stratos.cloud.controller.iaases.JcloudsIaas.destroyNode(JcloudsIaas.java:334)
>         at 
> org.apache.stratos.cloud.controller.iaases.JcloudsIaas.terminateInstance(JcloudsIaas.java:314)
>         at 
> org.apache.stratos.cloud.controller.services.impl.InstanceTerminator.run(InstanceTerminator.java:56)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-03-22 20:54:21,563]  INFO 
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Faulty 
> member detected [member-id] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
>  with [last time-stamp] 1427057336960 [time-out] 60000 milliseconds
> TID: [0] [STRATOS] [2015-03-22 20:54:21,563]  INFO 
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing 
> member fault event for [member-id] 
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> -- 
> Udara Liyanage 
> Software Engineer
> WSO2, Inc.: http://wso2.com 
> lean. enterprise. middleware
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897
> -- 
> Udara Liyanage 
> Software Engineer
> WSO2, Inc.: http://wso2.com 
> lean. enterprise. middleware
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to