Hi Shaheed, Thanks for detail explanation about the test. Will do a test round with HA setup and update.
Yes, we need this work for GA Thanks On Friday, May 15, 2015, Shaheedur Haque (shahhaqu) <shahh...@cisco.com> wrote: > Hi Imesh, > > > > I finally got round to a proper series of tests, and here are the > conclusions: > > > > · In Stratos 4.0, after a Pacemaker driven failover, the newly > Active Stratos has lost all Cartridge Definitions. > > · In current [1] Stratos 4.1, after a Pacemaker driven failover, > the newly Active Stratos: > > o Has lost all Deployment Policies. > > o Has lost contact with the Cartridge Agents, and all VMs are stuck > with whatever state they had before the failover. > > · Note: I have not verified if Cartridge Groups are lost or not. > > > > I include the test results below at [2] and [3]. I am concerned as to > whether 4.1 is ready for GA on this basis, so though more testing is no > doubt possible (e.g. Cartridge Groups) I wanted to get this info to the > list ASAP. > > > > Thanks, Shaheed > > > > [1] A recent build somewhere between beta 1 and beta 2, but I don’t think > any relevant fixes have been made in master. > > > > [2] Persistence test output from Stratos 4.1. Note: > > > > 1. In the build I have, the CLI is broken for a couple of commands; > these are supplemented by direct “curl” commands further down. > > 2. I’ve used one of our commands to show the instances and their > state for a given application since there is not a compact JSON or > convenient Startos CLI for that. > > > > *PERSISTENCE TEST, BEFORE FAILOVER* > > *================================* > > > > stratos> list-tenants > > Tenants: > > > +-----------------------+-----------+------------------+--------+------------------------------+ > > | Domain | Tenant ID | Email | State | Created > Date | > > > +-----------------------+-----------+------------------+--------+------------------------------+ > > | cloud1.qmog.cisco.com | 1 | clo...@cisco.com > <javascript:_e(%7B%7D,'cvml','clo...@cisco.com');> | Active | Fri May 15 > 04:46:58 MDT 2015 | > > > +-----------------------+-----------+------------------+--------+------------------------------+ > > > > stratos> list-network-partitions > > Network partitions found: > > +----------------------+----------------------+ > > | Network Partition ID | Number of Partitions | > > +----------------------+----------------------+ > > | RegionOne | 1 | > > +----------------------+----------------------+ > > > > stratos> list-deployment-policies > > Deployment policies found: > > +-------------------+---------------+ > > | ID | Accessibility | > > +-------------------+---------------+ > > | static-2-ha | 1 | > > +-------------------+---------------+ > > | autoscale-2-10-ha | 1 | > > +-------------------+---------------+ > > | autoscale-1-5 | 1 | > > +-------------------+---------------+ > > | static-1 | 1 | > > +-------------------+---------------+ > > > > stratos> list-application-policies > > Error in listing application policies > > No application policies found > > > > stratos> list-autoscaling-policies > > Error in listing autoscaling policies > > No autoscaling policies found > > > > stratos> list-cartridges > > Cartridges found: > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | Type | Category | Name | > Description | Version | Multi-Tenant | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cartridge-proxy | Application | cartridge-proxy | cartridge-proxy > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > > > stratos> list-applications > > Applications found: > > +-----------------+-----------------+----------+ > > | Application ID | Alias | Status | > > +-----------------+-----------------+----------+ > > | cartridge-proxy | cartridge-proxy | Deployed | > > +-----------------+-----------------+----------+ > > | cisco-sample-vm | cisco-sample-vm | Deployed | > > +-----------------+-----------------+----------+ > > > > $ curl -uadmin:admin -k -H'Content-type: application/json' > https://localhost:9443/api/autoscalingPolicies > > > [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}] > > > > $ curl -uadmin:admin -k -H'Content-type: application/json' > https://localhost:9443/api/applicationPolicies > > > [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}] > > > > > > *PERSISTENCE TEST, AFTER FAILOVER* > > *===============================* > > > > stratos> list-tenants > > Tenants: > > > +-----------------------+-----------+------------------+--------+------------------------------+ > > | Domain | Tenant ID | Email | State | Created > Date | > > > +-----------------------+-----------+------------------+--------+------------------------------+ > > | cloud1.qmog.cisco.com | 1 | clo...@cisco.com > <javascript:_e(%7B%7D,'cvml','clo...@cisco.com');> | Active | Fri May 15 > 05:26:52 MDT 2015 | > > > +-----------------------+-----------+------------------+--------+------------------------------+ > > > > stratos> list-network-partitions > > Network partitions found: > > +----------------------+----------------------+ > > | Network Partition ID | Number of Partitions | > > +----------------------+----------------------+ > > | RegionOne | 1 | > > +----------------------+----------------------+ > > > > stratos> list-deployment-policies > > No deployment policies found > > > > stratos> list-application-policies > > Error in listing application policies > > No application policies found > > > > stratos> list-autoscaling-policies > > Error in listing autoscaling policies > > No autoscaling policies found > > > > stratos> list-cartridges > > Cartridges found: > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | Type | Category | Name | > Description | Version | Multi-Tenant | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cartridge-proxy | Application | cartridge-proxy | cartridge-proxy > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > | cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf > Cartridge | 1 | false | > > > +------------------+-------------+------------------+----------------------------+---------+--------------+ > > > > stratos> list-applications > > Applications found: > > +-----------------+-----------------+----------+ > > | Application ID | Alias | Status | > > +-----------------+-----------------+----------+ > > | cartridge-proxy | cartridge-proxy | Deployed | > > +-----------------+-----------------+----------+ > > | cisco-sample-vm | cisco-sample-vm | Deployed | > > +-----------------+-----------------+----------+ > > > > $ curl -uadmin:admin -k -H'Content-type: application/json' > https://localhost:9443/api/autoscalingPolicies > > > [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}] > > > > $ curl -uadmin:admin -k -H'Content-type: application/json' > https://localhost:9443/api/applicationPolicies > > > [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}] > > > > [3] Cartridge test output from Stratos 4.1. Note: > > > > 1. We do not use a VIP for Stratos, either for 4.0 or 4.1. > > 2. We expect the Cartridge Agent to use a DNS lookup when it ends up > reconnecting, and this worked just fine in Stratos 4.0. > > > > *CARTRIDGE TEST, BEFORE FAILOVER* > > *==============================* > > > > $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm > > cisco-sample-vm: applicationInstances 1, groupInstances 0, > clusterInstances 1, members 1 (Active 1) > > cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active > > > > *CARTRIDGE TEST, AFTER FAILOVER* > > *=============================* > > > > $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm > > cisco-sample-vm: applicationInstances 1, groupInstances 0, > clusterInstances 1, members 1 (Active 1) > > cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active > > > > *CARTRIDGE TEST, AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN > WAIT 2 MINUTES* > > > *===================================================================================* > > > > $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm > > cisco-sample-vm: applicationInstances 1, groupInstances 0, > clusterInstances 1, members 1 (Active 1) > > cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active > > > > > > > > *From:* Imesh Gunaratne [mailto:im...@apache.org > <javascript:_e(%7B%7D,'cvml','im...@apache.org');>] > *Sent:* 14 May 2015 20:34 > *To:* dev > *Subject:* Re: Clustered deployments of Stratos > > > > It would be better to use the REST API to query and see whether the > relevant entities are persisted. Since data is stored in binary format in > the registry it would be difficult to query the database and verify this. > > > > On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) < > shahh...@cisco.com <javascript:_e(%7B%7D,'cvml','shahh...@cisco.com');>> > wrote: > > I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am > going to need more specifics. > > > > For example, what query would you recommend to look at say deployment > policies and cartridge definitions? > > > > *From:* Imesh Gunaratne [mailto:im...@apache.org > <javascript:_e(%7B%7D,'cvml','im...@apache.org');>] > *Sent:* 09 May 2015 09:08 > > > *To:* dev > *Subject:* Re: Clustered deployments of Stratos > > > > Yes you could refer the tables that have the prefix "REG_". > > > > On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) < > shahh...@cisco.com <javascript:_e(%7B%7D,'cvml','shahh...@cisco.com');>> > wrote: > > Can you suggest what tables to look at? > > > > *From:* Imesh Gunaratne [mailto:im...@apache.org > <javascript:_e(%7B%7D,'cvml','im...@apache.org');>] > *Sent:* 07 May 2015 18:00 > > > *To:* dev > *Subject:* Re: Clustered deployments of Stratos > > > > Hi Shaheed, > > > > Thanks for the clarification! May be the problem is with the MySQL > active-passive configuration. > > > > I understand that you are switching the same OpenStack volume from active > node to the passive node (when the passive node becomes active) therefore > technically it should work. May be we need to investigate this problem > further by analysing whether data is persisted properly in the active node > before the passive node becomes active. > > > > Thanks > > > > On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) < > shahh...@cisco.com <javascript:_e(%7B%7D,'cvml','shahh...@cisco.com');>> > wrote: > > The data is not synchronised between the active and passive nodes. For > clarity, this is the HA model we had, much as described in the blog: > > > > · 2 nodes, with Pacemaker in active-passive mode. > > · Under Pacemaker control: > > o We run MySQL in active-passive mode, using a single OpenStack volume > which we attach/reattach as the active role moves around nodes. > > o As the Pacemaker moves the volume, and thus MySQL around on node > failures, ActiveMQ and Stratos are moved around too. > > o Thus, everything operates in active-passive mode. > > > > Even in this model, as the active Stratos 4.0 is moved around (i.e. the > Stratos JVM on the old active node has gone with the node, and Pacemaker > starts up a new Stratos JVM on what used to be the passive node), we found > that the Cartridge Definition objects were found to be missing and, as a > clumsy workaround [1], we had to replay the stored copied of them into > Stratos using the REST API. > > > > With Stratos 4.1, using the new object names , early indications are > *Deployment > Policies* and *Application Deployment* policies are lost as the active > fails over to the passive. If anything, these objects are more likely to > hit the problems of [1], since Stratos 4.1 expects these to be tweaked on > the fly (min/max etc). > > > > Thanks, Shaheed > > > > [1] Clearly, this loses any changes that were not in the stored copies. > > > > *From:* Imesh Gunaratne [mailto:im...@apache.org > <javascript:_e(%7B%7D,'cvml','im...@apache.org');>] > *Sent:* 03 May 2015 06:43 > *To:* dev@stratos.apache.org > <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');> > > > *Subject:* Re: Clustered deployments of Stratos > > > > Hi Shaheed, > > > > Thanks for taking time to test this! > > > > Just to clarify the exact problem, do you mean that data is not > synchronized between the active and passive nodes or they are not persisted > in the active node? > > > > Thanks > > > On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <shahh...@cisco.com > <javascript:_e(%7B%7D,'cvml','shahh...@cisco.com');>> wrote: > > > I have been looking into our use of Linux HA to setup an Active-Passive > configuration. Testing indicates that in 4.1 (beta1), several objects seem > not to be persisted properly. This includes at least: > > - Cartridges > - Deployment policies > > Am I missing something? Is it safe to workaround this by replaying those > objects? > ------------------------------ > > *From:* Imesh Gunaratne [im...@apache.org > <javascript:_e(%7B%7D,'cvml','im...@apache.org');>] > *Sent:* 23 April 2015 10:47 > *To:* dev > *Subject:* Re: Clustered deployments of Stratos > > Hi Shaheed, > > > > Currently N-way clustering is still not possible with CC, AS & SM. We > completed the initial phase of this feature however it was not completed. > You could refer mail thread "[Discuss] Clustering Feature Implementation > for 4.1.0-Alpha Release" for details. > > > > However at present [1] is valid. We could use Linux HA and deploy CC, AS > and SM in Active-Passive mode. > > > > Thanks > > > > > > > > On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) < > shahh...@cisco.com <javascript:_e(%7B%7D,'cvml','shahh...@cisco.com');>> > wrote: > > Hi, > > > > We currently try to achieve HA with Stratos using something so unpleasant > that I won’t even describe it here J. It has been suggested that Stratos > has, for a while now, supported a clustered mode of deployment where, given > N servers: > > > > · The SM, AS and MB operate in a N-way clustered mode > > · The CEP operates in a N-way loadsharing mode > > · The Cartridge Agents can react to a failure in one of the N CEPs > by failing over to one of the other N-1 remaining servers > > > > In looking for documentation on how to set this up, I came across these > two write-ups [1] and [2]. Questions: > > > > · Both these documents mention only using N=2. Is that still > correct? > > · [1] Seems recently written, and [2] is a little older but not > much. Are both documents still regarded as current? > > > > Also, I’d love to hear any other experiences people have of running > configurations like this. > > > > Thanks, Shaheed > > > > [1] > https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat > > [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html > > > > > > > > > > > > -- > > Imesh Gunaratne > > > > Technical Lead, WSO2 > > Committer & PMC Member, Apache Stratos > > > > -- > > Imesh Gunaratne > > > > Senior Technical Lead, WSO2 > > Committer & PMC Member, Apache Stratos > > > > > > > > -- > > Imesh Gunaratne > > > > Senior Technical Lead, WSO2 > > Committer & PMC Member, Apache Stratos > > > > > > -- > > Imesh Gunaratne > > > > Senior Technical Lead, WSO2 > > Committer & PMC Member, Apache Stratos > > > > > > -- > > Imesh Gunaratne > > > > Senior Technical Lead, WSO2 > > Committer & PMC Member, Apache Stratos > -- Sent from Gmail Mobile