RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Martin Eppel (meppel) Fri, 05 Jun 2015 13:00:03 -0700

For this latest test I got the latest source from stratos repo so I have this 
commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, 
including the ones which got started after the “application un-deployment” 
process started.
What is still left in stratos (even after all members got terminated) is the 
application (see the stratos> list-applications command result below in email 
thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks


Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <[email protected]>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:[email protected]]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails 
to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA 
but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the 
below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the 
termination request was made for an already terminated member. There is a 
possibility that Autoscaler make a second terminate request if the first 
request take some time to execute and at the time the second request hit Cloud 
Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its 
just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR 
{org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
 -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System 
error, cannot acquire a write lock while having a read lock on the same thread: 
[lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at 
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at 
org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new 
members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO 
{org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
 -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR 
{org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} 
-  Could not terminate instance: [member-id] 
g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not 
terminate instance, member context not found: [member-id] 
g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at 
org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown 
Source)
                    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, 
members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: [email protected]<mailto:[email protected]>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails 
to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:[email protected]]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails 
to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you 
want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails 
to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published 
by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage 
<[email protected]<mailto:[email protected]>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but 
it is still identified as an obsolete member and is being marked to be 
terminated since pending time expired. Does that mean member is still in 
Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO 
{org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat 
context has been added: [application] g-sc-G12-1 [cluster] 
g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 
[partitionContext] whole-region [member-id] 
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO 
{org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
 -  Publishing member activated event: [service-name] c1 [cluster-id] 
g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] 
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 
[network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO 
{org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
 -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain 
[member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO 
{org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
 -  Pending state of member expired, member will be moved to obsolete list. 
[pending member] 
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 
900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using 
stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, 
application.json, cartridge-group.json, deployment-policy, auto-scaling 
policies see attached zip file.

It is noteworthy, that while the application is running the following log 
statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: 
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR 
{org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate 
instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO 
{org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  
Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO 
{org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance 
started successfully: [cartridge-type] c2 [cluster-id] 
g-sc-G12-1.c2-1x0.c2.domain [instance-id] 
RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the 
logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been 
taken by the same thread: [lock-name]


[1.] Application structure

[cid:[email protected]]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Reply via email to