For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent). As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started. What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application ! I will do a few reruns to verify the removal of the VMs (members) is consistent. Thanks
Martin git show 2fe84b91843b20e91e8cafd06011f42d218f231c commit 2fe84b91843b20e91e8cafd06011f42d218f231c Author: anuruddhal <anuruddha...@gmail.com> Date: Wed Jun 3 14:41:12 2015 +0530 From: Imesh Gunaratne [mailto:im...@apache.org] Sent: Friday, June 05, 2015 12:46 PM To: dev Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling) Hi Martin, I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA. I can see that Anuruddha has done a fix for the issue I'm referring with the below commit: https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request. Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing? Thanks On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <mep...@cisco.com<mailto:mep...@cisco.com>> wrote: Hi Udara, Picked up your commit and rerun the test case: Attached is the log file (artifacts are the same as before). Didn’t see the issue with “Member is in the wrong list” … but see the following exception after the undeploy application message: TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} - Failed to retrieve topology event message org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2 at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114) at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60) Also, after the “Application undeployment process started” is started, new members are being instantiated: TID: [0] [STRATOS] [2015-06-05 18:07:46,545] INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} - Publishing member created event: Eventually, these VMs get terminated : TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} - Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595) at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) but the application remains: stratos> list-applications Applications found: +----------------+------------+----------+ | Application ID | Alias | Status | +----------------+------------+----------+ | g-sc-G12-1 | g-sc-G12-1 | Deployed | +----------------+------------+----------+ ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n'] From: Martin Eppel (meppel) Sent: Friday, June 05, 2015 10:04 AM To: dev@stratos.apache.org<mailto:dev@stratos.apache.org> Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling) Ok: log4j.logger.org.apache.stratos.manager=DEBUG log4j.logger.org.apache.stratos.autoscaler=DEBUG log4j.logger.org.apache.stratos.messaging=INFO log4j.logger.org.apache.stratos.cloud.controller=DEBUG log4j.logger.org.wso2.andes.client=ERROR # Autoscaler rule logs log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG From: Udara Liyanage [mailto:ud...@wso2.com] Sent: Friday, June 05, 2015 10:00 AM To: dev Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling) Hi Martin, Better if you can enable debugs logs for all AS, CC and cartridge agent On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com<mailto:ud...@wso2.com>> wrote: Hi, Please enable AS debug logs. On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <mep...@cisco.com<mailto:mep...@cisco.com>> wrote: Hi Udara, Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ? Thanks Martin From: Udara Liyanage [mailto:ud...@wso2.com<mailto:ud...@wso2.com>] Sent: Thursday, June 04, 2015 11:11 PM To: dev Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling) Hi, This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible. Or else I can add an INFO logs and commit. On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com<mailto:ud...@wso2.com>> wrote: Hi, For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated? //member started TID: [0] [STRATOS] [2015-06-04 19:53:04,706] INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} - Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 //member activated TID: [0] [STRATOS] [2015-06-04 19:56:00,907] INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} - Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region TID: [0] [STRATOS] [2015-06-04 19:56:00,916] INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} - Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 //after 15 minutes ---member is still in pending state, pending timeout expired TID: [0] [STRATOS] [2015-06-04 20:08:04,713] INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} - Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <mep...@cisco.com<mailto:mep...@cisco.com>> wrote: Hi, I am running into a scenario where application un-deployment fails (using stratos with latest commit b1b6bca3f99b6127da24c9af0a6b20faff2907be). For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file. It is noteworthy, that while the application is running the following log statements /exceptions are observed: … Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 … TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} - Cannot terminate instance … // after receiving the application undeploy event: [2015-06-04 20:12:39,465] INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} - Application undeployment process started: [application-id] g-sc-G12-1 // a new instance is being started up … [2015-06-04 20:13:13,445] INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} - Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407 // Also noteworthy seems the following warning which is seen repeatedly in the logs: ReadWriteLock} - System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] [1.] Application structure [cid:image001.png@01D09F8F.6C0C9CD0] -- Udara Liyanage Software Engineer WSO2, Inc.: http://wso2.com<http://wso2.com/> lean. enterprise. middleware web: http://udaraliyanage.wordpress.com phone: +94 71 443 6897<tel:%2B94%2071%20443%206897> -- Udara Liyanage Software Engineer WSO2, Inc.: http://wso2.com<http://wso2.com/> lean. enterprise. middleware web: http://udaraliyanage.wordpress.com phone: +94 71 443 6897<tel:%2B94%2071%20443%206897> -- Udara Liyanage Software Engineer WSO2, Inc.: http://wso2.com<http://wso2.com/> lean. enterprise. middleware web: http://udaraliyanage.wordpress.com phone: +94 71 443 6897<tel:%2B94%2071%20443%206897> -- Udara Liyanage Software Engineer WSO2, Inc.: http://wso2.com<http://wso2.com/> lean. enterprise. middleware web: http://udaraliyanage.wordpress.com phone: +94 71 443 6897<tel:%2B94%2071%20443%206897> -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos