The latter; we never have both Stratos instances running. From: Imesh Gunaratne [mailto:im...@apache.org] Sent: 15 May 2015 16:17 To: dev Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini) Subject: Re: Clustered deployments of Stratos
Hi Shaheed, Do you have both active and passive Stratos nodes running at the same time or do you start the passive node once the active node goes down? Thanks On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: Hi Imesh, I finally got round to a proper series of tests, and here are the conclusions: • In Stratos 4.0, after a Pacemaker driven failover, the newly Active Stratos has lost all Cartridge Definitions. • In current [1] Stratos 4.1, after a Pacemaker driven failover, the newly Active Stratos: o Has lost all Deployment Policies. o Has lost contact with the Cartridge Agents, and all VMs are stuck with whatever state they had before the failover. • Note: I have not verified if Cartridge Groups are lost or not. I include the test results below at [2] and [3]. I am concerned as to whether 4.1 is ready for GA on this basis, so though more testing is no doubt possible (e.g. Cartridge Groups) I wanted to get this info to the list ASAP. Thanks, Shaheed [1] A recent build somewhere between beta 1 and beta 2, but I don’t think any relevant fixes have been made in master. [2] Persistence test output from Stratos 4.1. Note: 1. In the build I have, the CLI is broken for a couple of commands; these are supplemented by direct “curl” commands further down. 2. I’ve used one of our commands to show the instances and their state for a given application since there is not a compact JSON or convenient Startos CLI for that. PERSISTENCE TEST, BEFORE FAILOVER ================================ stratos> list-tenants Tenants: +-----------------------+-----------+------------------+--------+------------------------------+ | Domain | Tenant ID | Email | State | Created Date | +-----------------------+-----------+------------------+--------+------------------------------+ | cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1 | clo...@cisco.com<mailto:clo...@cisco.com> | Active | Fri May 15 04:46:58 MDT 2015 | +-----------------------+-----------+------------------+--------+------------------------------+ stratos> list-network-partitions Network partitions found: +----------------------+----------------------+ | Network Partition ID | Number of Partitions | +----------------------+----------------------+ | RegionOne | 1 | +----------------------+----------------------+ stratos> list-deployment-policies Deployment policies found: +-------------------+---------------+ | ID | Accessibility | +-------------------+---------------+ | static-2-ha | 1 | +-------------------+---------------+ | autoscale-2-10-ha | 1 | +-------------------+---------------+ | autoscale-1-5 | 1 | +-------------------+---------------+ | static-1 | 1 | +-------------------+---------------+ stratos> list-application-policies Error in listing application policies No application policies found stratos> list-autoscaling-policies Error in listing autoscaling policies No autoscaling policies found stratos> list-cartridges Cartridges found: +------------------+-------------+------------------+----------------------------+---------+--------------+ | Type | Category | Name | Description | Version | Multi-Tenant | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cartridge-proxy | Application | cartridge-proxy | cartridge-proxy Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ stratos> list-applications Applications found: +-----------------+-----------------+----------+ | Application ID | Alias | Status | +-----------------+-----------------+----------+ | cartridge-proxy | cartridge-proxy | Deployed | +-----------------+-----------------+----------+ | cisco-sample-vm | cisco-sample-vm | Deployed | +-----------------+-----------------+----------+ $ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}] $ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}] PERSISTENCE TEST, AFTER FAILOVER =============================== stratos> list-tenants Tenants: +-----------------------+-----------+------------------+--------+------------------------------+ | Domain | Tenant ID | Email | State | Created Date | +-----------------------+-----------+------------------+--------+------------------------------+ | cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1 | clo...@cisco.com<mailto:clo...@cisco.com> | Active | Fri May 15 05:26:52 MDT 2015 | +-----------------------+-----------+------------------+--------+------------------------------+ stratos> list-network-partitions Network partitions found: +----------------------+----------------------+ | Network Partition ID | Number of Partitions | +----------------------+----------------------+ | RegionOne | 1 | +----------------------+----------------------+ stratos> list-deployment-policies No deployment policies found stratos> list-application-policies Error in listing application policies No application policies found stratos> list-autoscaling-policies Error in listing autoscaling policies No autoscaling policies found stratos> list-cartridges Cartridges found: +------------------+-------------+------------------+----------------------------+---------+--------------+ | Type | Category | Name | Description | Version | Multi-Tenant | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cartridge-proxy | Application | cartridge-proxy | cartridge-proxy Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ | cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf Cartridge | 1 | false | +------------------+-------------+------------------+----------------------------+---------+--------------+ stratos> list-applications Applications found: +-----------------+-----------------+----------+ | Application ID | Alias | Status | +-----------------+-----------------+----------+ | cartridge-proxy | cartridge-proxy | Deployed | +-----------------+-----------------+----------+ | cisco-sample-vm | cisco-sample-vm | Deployed | +-----------------+-----------------+----------+ $ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}] $ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}] [3] Cartridge test output from Stratos 4.1. Note: 1. We do not use a VIP for Stratos, either for 4.0 or 4.1. 2. We expect the Cartridge Agent to use a DNS lookup when it ends up reconnecting, and this worked just fine in Stratos 4.0. CARTRIDGE TEST, BEFORE FAILOVER ============================== $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1) cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active CARTRIDGE TEST, AFTER FAILOVER ============================= $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1) cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active CARTRIDGE TEST, AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN WAIT 2 MINUTES =================================================================================== $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1) cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>] Sent: 14 May 2015 20:34 To: dev Subject: Re: Clustered deployments of Stratos It would be better to use the REST API to query and see whether the relevant entities are persisted. Since data is stored in binary format in the registry it would be difficult to query the database and verify this. On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going to need more specifics. For example, what query would you recommend to look at say deployment policies and cartridge definitions? From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>] Sent: 09 May 2015 09:08 To: dev Subject: Re: Clustered deployments of Stratos Yes you could refer the tables that have the prefix "REG_". On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: Can you suggest what tables to look at? From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>] Sent: 07 May 2015 18:00 To: dev Subject: Re: Clustered deployments of Stratos Hi Shaheed, Thanks for the clarification! May be the problem is with the MySQL active-passive configuration. I understand that you are switching the same OpenStack volume from active node to the passive node (when the passive node becomes active) therefore technically it should work. May be we need to investigate this problem further by analysing whether data is persisted properly in the active node before the passive node becomes active. Thanks On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog: • 2 nodes, with Pacemaker in active-passive mode. • Under Pacemaker control: o We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes. o As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too. o Thus, everything operates in active-passive mode. Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API. With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc). Thanks, Shaheed [1] Clearly, this loses any changes that were not in the stored copies. From: Imesh Gunaratne [mailto:im...@apache.org<mailto:im...@apache.org>] Sent: 03 May 2015 06:43 To: dev@stratos.apache.org<mailto:dev@stratos.apache.org> Subject: Re: Clustered deployments of Stratos Hi Shaheed, Thanks for taking time to test this! Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node? Thanks On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least: - Cartridges - Deployment policies Am I missing something? Is it safe to workaround this by replaying those objects? ________________________________ From: Imesh Gunaratne [im...@apache.org<mailto:im...@apache.org>] Sent: 23 April 2015 10:47 To: dev Subject: Re: Clustered deployments of Stratos Hi Shaheed, Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details. However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode. Thanks On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: Hi, We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers: • The SM, AS and MB operate in a N-way clustered mode • The CEP operates in a N-way loadsharing mode • The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions: • Both these documents mention only using N=2. Is that still correct? • [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current? Also, I’d love to hear any other experiences people have of running configurations like this. Thanks, Shaheed [1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html -- Imesh Gunaratne Technical Lead, WSO2 Committer & PMC Member, Apache Stratos -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos