The latter; we never have both Stratos instances running.

From: Imesh Gunaratne [mailto:im...@apache.org]
Sent: 15 May 2015 16:17
To: dev
Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini)
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Do you have both active and passive Stratos nodes running at the same time or 
do you start the passive node once the active node goes down?

Thanks

On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) 
<shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote:
Hi Imesh,

I finally got round to a proper series of tests, and here are the conclusions:


•        In Stratos 4.0, after a Pacemaker driven failover, the newly Active 
Stratos has lost all Cartridge Definitions.

•        In current [1] Stratos 4.1, after a Pacemaker driven failover, the 
newly Active Stratos:

o   Has lost all Deployment Policies.

o   Has lost contact with the Cartridge Agents, and all VMs are stuck with 
whatever state they had before the failover.

•        Note: I have not verified if Cartridge Groups are lost or not.

I include the test results below at [2] and [3]. I am concerned as to whether 
4.1 is ready for GA on this basis, so though more testing is no doubt possible 
(e.g. Cartridge Groups) I wanted to get this info to the list ASAP.

Thanks, Shaheed

[1] A recent build somewhere between beta 1 and beta 2, but I don’t think any 
relevant fixes have been made in master.

[2] Persistence test output from Stratos 4.1. Note:


1.      In the build I have, the CLI is broken for a couple of commands; these 
are supplemented by direct “curl” commands further down.

2.      I’ve used one of our commands to show the instances and their state for 
a given application since there is not a compact JSON or convenient Startos CLI 
for that.

PERSISTENCE TEST, BEFORE FAILOVER
================================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date  
               |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | 
clo...@cisco.com<mailto:clo...@cisco.com> | Active | Fri May 15 04:46:58 MDT 
2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
Deployment policies found:
+-------------------+---------------+
| ID                | Accessibility |
+-------------------+---------------+
| static-2-ha       | 1             |
+-------------------+---------------+
| autoscale-2-10-ha | 1             |
+-------------------+---------------+
| autoscale-1-5     | 1             |
+-------------------+---------------+
| static-1          | 1             |
+-------------------+---------------+

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description               
 | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge 
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge 
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 
Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 
Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge   
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge   
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' 
https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' 
https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]


PERSISTENCE TEST, AFTER FAILOVER
===============================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date  
               |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | 
clo...@cisco.com<mailto:clo...@cisco.com> | Active | Fri May 15 05:26:52 MDT 
2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
No deployment policies found

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description               
 | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge 
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge 
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 
Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 
Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge   
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge   
 | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' 
https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' 
https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]

[3] Cartridge test output from Stratos 4.1. Note:


1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.

2.      We expect the Cartridge Agent to use a DNS lookup when it ends up 
reconnecting, and this worked just fine in Stratos 4.0.

CARTRIDGE TEST, BEFORE FAILOVER
==============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Active 1)
     cisco-sample-vm: 
172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST, AFTER FAILOVER
=============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Active 1)
     cisco-sample-vm: 
172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN WAIT 2 
MINUTES
===================================================================================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Active 1)
     cisco-sample-vm: 
172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active



From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>]
Sent: 14 May 2015 20:34

To: dev
Subject: Re: Clustered deployments of Stratos

It would be better to use the REST API to query and see whether the relevant 
entities are persisted. Since data is stored in binary format in the registry 
it would be difficult to query the database and verify this.

On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) 
<shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote:
I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going 
to need more specifics.

For example, what query would you recommend to look at say deployment policies 
and cartridge definitions?

From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>]
Sent: 09 May 2015 09:08

To: dev
Subject: Re: Clustered deployments of Stratos

Yes you could refer the tables that have the prefix "REG_".

On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) 
<shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote:
Can you suggest what tables to look at?

From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>]
Sent: 07 May 2015 18:00

To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL 
active-passive configuration.

I understand that you are switching the same OpenStack volume from active node 
to the passive node (when the passive node becomes active) therefore 
technically it should work. May be we need to investigate this problem further 
by analysing whether data is persisted properly in the active node before the 
passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) 
<shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote:
The data is not synchronised between the active and passive nodes. For clarity, 
this is the HA model we had, much as described in the blog:


•        2 nodes, with Pacemaker in active-passive mode.

•        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which 
we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, 
ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos 
JVM on the old active node has gone with the node, and Pacemaker starts up a 
new Stratos JVM on what used to be the passive node), we found that the 
Cartridge Definition objects were found to be missing and, as a clumsy 
workaround [1], we had to replay the stored copied of them into Stratos using 
the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment 
Policies and Application Deployment policies are lost as the active fails over 
to the passive. If anything, these objects are more likely to hit the problems 
of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org<mailto:dev@stratos.apache.org>

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized 
between the active and passive nodes or they are not persisted in the active 
node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) 
<shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive 
configuration. Testing indicates that in 4.1 (beta1), several objects seem not 
to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those 
objects?
________________________________
From: Imesh Gunaratne [im...@apache.org<mailto:im...@apache.org>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed 
the initial phase of this feature however it was not completed. You could refer 
mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha 
Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM 
in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) 
<shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that 
I won’t even describe it here ☺. It has been suggested that Stratos has, for a 
while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by 
failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two 
write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. 
Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running 
configurations like this.

Thanks, Shaheed

[1] 
https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Reply via email to