Hi Shaheed, We had been using member fault detection to test the termination-behavior in beta2 and after that like one week before. So, i believe that It will work in the latest master. However we will also verify this again.
Thanks, Reka On Mon, May 18, 2015 at 7:53 PM, Imesh Gunaratne <im...@apache.org> wrote: > Thanks Shaheed! I will verify the second problem where Stratos is not > detecting manually terminated members. > > Thanks > > On Mon, May 18, 2015 at 3:39 PM, Shaheedur Haque (shahhaqu) < > shahh...@cisco.com> wrote: > >> Ack. We are just in the middle of doing getting sync’d up again to >> master, and it sounds like that might fix the persistence issue. >> >> >> >> I guess that leaves the Cartridge Agent reconnect side of the problem… >> >> >> >> *From:* Lahiru Sandaruwan [mailto:lahi...@wso2.com] >> *Sent:* 17 May 2015 03:06 >> >> *To:* dev >> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini) >> *Subject:* Re: Clustered deployments of Stratos >> >> >> >> Hi Shaheed, >> >> >> >> Similarly it would be a great help, if you can verify all these issues in >> latest code, since we have been fixing a lot of issues in recent days, as a >> result of RC1 testing. >> >> >> >> Thanks. >> >> >> >> On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <im...@apache.org> >> wrote: >> >> Hi Shaheed, >> >> >> >> Thanks for the quick response, after analyzing the results you have >> provided again, it looks like only the deployment policies are missing >> after the failover. We have fixed this issue in commit >> revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3 >> >> >> >> >> http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3c22eed4e8639c401a8fda637fa6bb4...@git.apache.org%3E >> >> >> >> Would you mind verifying whether this is there in your runtime? >> >> >> >> Thanks >> >> >> >> >> >> On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) < >> shahh...@cisco.com> wrote: >> >> The latter; we never have both Stratos instances running. >> >> >> >> *From:* Imesh Gunaratne [mailto:im...@apache.org] >> *Sent:* 15 May 2015 16:17 >> *To:* dev >> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini) >> >> >> *Subject:* Re: Clustered deployments of Stratos >> >> >> >> Hi Shaheed, >> >> >> >> Do you have both active and passive Stratos nodes running at the same >> time or do you start the passive node once the active node goes down? >> >> >> >> Thanks >> >> >> >> On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) < >> shahh...@cisco.com> wrote: >> >> Hi Imesh, >> >> >> >> I finally got round to a proper series of tests, and here are the >> conclusions: >> >> >> >> · In Stratos 4.0, after a Pacemaker driven failover, the newly >> Active Stratos has lost all Cartridge Definitions. >> >> · In current [1] Stratos 4.1, after a Pacemaker driven failover, >> the newly Active Stratos: >> >> o Has lost all Deployment Policies. >> >> o Has lost contact with the Cartridge Agents, and all VMs are stuck >> with whatever state they had before the failover. >> >> · Note: I have not verified if Cartridge Groups are lost or not. >> >> >> >> I include the test results below at [2] and [3]. I am concerned as to >> whether 4.1 is ready for GA on this basis, so though more testing is no >> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the >> list ASAP. >> >> >> >> Thanks, Shaheed >> >> >> >> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think >> any relevant fixes have been made in master. >> >> >> >> [2] Persistence test output from Stratos 4.1. Note: >> >> >> >> 1. In the build I have, the CLI is broken for a couple of commands; >> these are supplemented by direct “curl” commands further down. >> >> 2. I’ve used one of our commands to show the instances and their >> state for a given application since there is not a compact JSON or >> convenient Startos CLI for that. >> >> >> >> *PERSISTENCE TEST, BEFORE FAILOVER* >> >> *================================* >> >> >> >> stratos> list-tenants >> >> Tenants: >> >> >> +-----------------------+-----------+------------------+--------+------------------------------+ >> >> | Domain | Tenant ID | Email | State | Created >> Date | >> >> >> +-----------------------+-----------+------------------+--------+------------------------------+ >> >> | cloud1.qmog.cisco.com | 1 | clo...@cisco.com | Active | Fri >> May 15 04:46:58 MDT 2015 | >> >> >> +-----------------------+-----------+------------------+--------+------------------------------+ >> >> >> >> stratos> list-network-partitions >> >> Network partitions found: >> >> +----------------------+----------------------+ >> >> | Network Partition ID | Number of Partitions | >> >> +----------------------+----------------------+ >> >> | RegionOne | 1 | >> >> +----------------------+----------------------+ >> >> >> >> stratos> list-deployment-policies >> >> Deployment policies found: >> >> +-------------------+---------------+ >> >> | ID | Accessibility | >> >> +-------------------+---------------+ >> >> | static-2-ha | 1 | >> >> +-------------------+---------------+ >> >> | autoscale-2-10-ha | 1 | >> >> +-------------------+---------------+ >> >> | autoscale-1-5 | 1 | >> >> +-------------------+---------------+ >> >> | static-1 | 1 | >> >> +-------------------+---------------+ >> >> >> >> stratos> list-application-policies >> >> Error in listing application policies >> >> No application policies found >> >> >> >> stratos> list-autoscaling-policies >> >> Error in listing autoscaling policies >> >> No autoscaling policies found >> >> >> >> stratos> list-cartridges >> >> Cartridges found: >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | Type | Category | Name | >> Description | Version | Multi-Tenant | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cartridge-proxy | Application | cartridge-proxy | cartridge-proxy >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> >> >> stratos> list-applications >> >> Applications found: >> >> +-----------------+-----------------+----------+ >> >> | Application ID | Alias | Status | >> >> +-----------------+-----------------+----------+ >> >> | cartridge-proxy | cartridge-proxy | Deployed | >> >> +-----------------+-----------------+----------+ >> >> | cisco-sample-vm | cisco-sample-vm | Deployed | >> >> +-----------------+-----------------+----------+ >> >> >> >> $ curl -uadmin:admin -k -H'Content-type: application/json' >> https://localhost:9443/api/autoscalingPolicies >> >> >> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}] >> >> >> >> $ curl -uadmin:admin -k -H'Content-type: application/json' >> https://localhost:9443/api/applicationPolicies >> >> >> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}] >> >> >> >> >> >> *PERSISTENCE TEST, AFTER FAILOVER* >> >> *===============================* >> >> >> >> stratos> list-tenants >> >> Tenants: >> >> >> +-----------------------+-----------+------------------+--------+------------------------------+ >> >> | Domain | Tenant ID | Email | State | Created >> Date | >> >> >> +-----------------------+-----------+------------------+--------+------------------------------+ >> >> | cloud1.qmog.cisco.com | 1 | clo...@cisco.com | Active | Fri >> May 15 05:26:52 MDT 2015 | >> >> >> +-----------------------+-----------+------------------+--------+------------------------------+ >> >> >> >> stratos> list-network-partitions >> >> Network partitions found: >> >> +----------------------+----------------------+ >> >> | Network Partition ID | Number of Partitions | >> >> +----------------------+----------------------+ >> >> | RegionOne | 1 | >> >> +----------------------+----------------------+ >> >> >> >> stratos> list-deployment-policies >> >> No deployment policies found >> >> >> >> stratos> list-application-policies >> >> Error in listing application policies >> >> No application policies found >> >> >> >> stratos> list-autoscaling-policies >> >> Error in listing autoscaling policies >> >> No autoscaling policies found >> >> >> >> stratos> list-cartridges >> >> Cartridges found: >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | Type | Category | Name | >> Description | Version | Multi-Tenant | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cartridge-proxy | Application | cartridge-proxy | cartridge-proxy >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> | cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf >> Cartridge | 1 | false | >> >> >> +------------------+-------------+------------------+----------------------------+---------+--------------+ >> >> >> >> stratos> list-applications >> >> Applications found: >> >> +-----------------+-----------------+----------+ >> >> | Application ID | Alias | Status | >> >> +-----------------+-----------------+----------+ >> >> | cartridge-proxy | cartridge-proxy | Deployed | >> >> +-----------------+-----------------+----------+ >> >> | cisco-sample-vm | cisco-sample-vm | Deployed | >> >> +-----------------+-----------------+----------+ >> >> >> >> $ curl -uadmin:admin -k -H'Content-type: application/json' >> https://localhost:9443/api/autoscalingPolicies >> >> >> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}] >> >> >> >> $ curl -uadmin:admin -k -H'Content-type: application/json' >> https://localhost:9443/api/applicationPolicies >> >> >> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}] >> >> >> >> [3] Cartridge test output from Stratos 4.1. Note: >> >> >> >> 1. We do not use a VIP for Stratos, either for 4.0 or 4.1. >> >> 2. We expect the Cartridge Agent to use a DNS lookup when it ends >> up reconnecting, and this worked just fine in Stratos 4.0. >> >> >> >> *CARTRIDGE TEST, BEFORE FAILOVER* >> >> *==============================* >> >> >> >> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm >> >> cisco-sample-vm: applicationInstances 1, groupInstances 0, >> clusterInstances 1, members 1 (Active 1) >> >> cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active >> >> >> >> *CARTRIDGE TEST, AFTER FAILOVER* >> >> *=============================* >> >> >> >> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm >> >> cisco-sample-vm: applicationInstances 1, groupInstances 0, >> clusterInstances 1, members 1 (Active 1) >> >> cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active >> >> >> >> *CARTRIDGE TEST, AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN >> WAIT 2 MINUTES* >> >> >> *===================================================================================* >> >> >> >> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm >> >> cisco-sample-vm: applicationInstances 1, groupInstances 0, >> clusterInstances 1, members 1 (Active 1) >> >> cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active >> >> >> >> >> >> >> >> *From:* Imesh Gunaratne [mailto:im...@apache.org] >> *Sent:* 14 May 2015 20:34 >> >> >> *To:* dev >> *Subject:* Re: Clustered deployments of Stratos >> >> >> >> It would be better to use the REST API to query and see whether the >> relevant entities are persisted. Since data is stored in binary format in >> the registry it would be difficult to query the database and verify this. >> >> >> >> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) < >> shahh...@cisco.com> wrote: >> >> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am >> going to need more specifics. >> >> >> >> For example, what query would you recommend to look at say deployment >> policies and cartridge definitions? >> >> >> >> *From:* Imesh Gunaratne [mailto:im...@apache.org] >> *Sent:* 09 May 2015 09:08 >> >> >> *To:* dev >> *Subject:* Re: Clustered deployments of Stratos >> >> >> >> Yes you could refer the tables that have the prefix "REG_". >> >> >> >> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) < >> shahh...@cisco.com> wrote: >> >> Can you suggest what tables to look at? >> >> >> >> *From:* Imesh Gunaratne [mailto:im...@apache.org] >> *Sent:* 07 May 2015 18:00 >> >> >> *To:* dev >> *Subject:* Re: Clustered deployments of Stratos >> >> >> >> Hi Shaheed, >> >> >> >> Thanks for the clarification! May be the problem is with the MySQL >> active-passive configuration. >> >> >> >> I understand that you are switching the same OpenStack volume from active >> node to the passive node (when the passive node becomes active) therefore >> technically it should work. May be we need to investigate this problem >> further by analysing whether data is persisted properly in the active node >> before the passive node becomes active. >> >> >> >> Thanks >> >> >> >> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) < >> shahh...@cisco.com> wrote: >> >> The data is not synchronised between the active and passive nodes. For >> clarity, this is the HA model we had, much as described in the blog: >> >> >> >> · 2 nodes, with Pacemaker in active-passive mode. >> >> · Under Pacemaker control: >> >> o We run MySQL in active-passive mode, using a single OpenStack volume >> which we attach/reattach as the active role moves around nodes. >> >> o As the Pacemaker moves the volume, and thus MySQL around on node >> failures, ActiveMQ and Stratos are moved around too. >> >> o Thus, everything operates in active-passive mode. >> >> >> >> Even in this model, as the active Stratos 4.0 is moved around (i.e. the >> Stratos JVM on the old active node has gone with the node, and Pacemaker >> starts up a new Stratos JVM on what used to be the passive node), we found >> that the Cartridge Definition objects were found to be missing and, as a >> clumsy workaround [1], we had to replay the stored copied of them into >> Stratos using the REST API. >> >> >> >> With Stratos 4.1, using the new object names , early indications are >> *Deployment >> Policies* and *Application Deployment* policies are lost as the active >> fails over to the passive. If anything, these objects are more likely to >> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on >> the fly (min/max etc). >> >> >> >> Thanks, Shaheed >> >> >> >> [1] Clearly, this loses any changes that were not in the stored copies. >> >> >> >> *From:* Imesh Gunaratne [mailto:im...@apache.org] >> *Sent:* 03 May 2015 06:43 >> *To:* dev@stratos.apache.org >> >> >> *Subject:* Re: Clustered deployments of Stratos >> >> >> >> Hi Shaheed, >> >> >> >> Thanks for taking time to test this! >> >> >> >> Just to clarify the exact problem, do you mean that data is not >> synchronized between the active and passive nodes or they are not persisted >> in the active node? >> >> >> >> Thanks >> >> >> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <shahh...@cisco.com> >> wrote: >> >> >> I have been looking into our use of Linux HA to setup an Active-Passive >> configuration. Testing indicates that in 4.1 (beta1), several objects seem >> not to be persisted properly. This includes at least: >> >> - Cartridges >> - Deployment policies >> >> Am I missing something? Is it safe to workaround this by replaying those >> objects? >> ------------------------------ >> >> *From:* Imesh Gunaratne [im...@apache.org] >> *Sent:* 23 April 2015 10:47 >> *To:* dev >> *Subject:* Re: Clustered deployments of Stratos >> >> Hi Shaheed, >> >> >> >> Currently N-way clustering is still not possible with CC, AS & SM. We >> completed the initial phase of this feature however it was not completed. >> You could refer mail thread "[Discuss] Clustering Feature Implementation >> for 4.1.0-Alpha Release" for details. >> >> >> >> However at present [1] is valid. We could use Linux HA and deploy CC, AS >> and SM in Active-Passive mode. >> >> >> >> Thanks >> >> >> >> >> >> >> >> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) < >> shahh...@cisco.com> wrote: >> >> Hi, >> >> >> >> We currently try to achieve HA with Stratos using something so unpleasant >> that I won’t even describe it here J. It has been suggested that Stratos >> has, for a while now, supported a clustered mode of deployment where, given >> N servers: >> >> >> >> · The SM, AS and MB operate in a N-way clustered mode >> >> · The CEP operates in a N-way loadsharing mode >> >> · The Cartridge Agents can react to a failure in one of the N >> CEPs by failing over to one of the other N-1 remaining servers >> >> >> >> In looking for documentation on how to set this up, I came across these >> two write-ups [1] and [2]. Questions: >> >> >> >> · Both these documents mention only using N=2. Is that still >> correct? >> >> · [1] Seems recently written, and [2] is a little older but not >> much. Are both documents still regarded as current? >> >> >> >> Also, I’d love to hear any other experiences people have of running >> configurations like this. >> >> >> >> Thanks, Shaheed >> >> >> >> [1] >> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat >> >> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> Imesh Gunaratne >> >> >> >> Technical Lead, WSO2 >> >> Committer & PMC Member, Apache Stratos >> >> >> >> -- >> >> Imesh Gunaratne >> >> >> >> Senior Technical Lead, WSO2 >> >> Committer & PMC Member, Apache Stratos >> >> >> >> >> >> >> >> -- >> >> Imesh Gunaratne >> >> >> >> Senior Technical Lead, WSO2 >> >> Committer & PMC Member, Apache Stratos >> >> >> >> >> >> -- >> >> Imesh Gunaratne >> >> >> >> Senior Technical Lead, WSO2 >> >> Committer & PMC Member, Apache Stratos >> >> >> >> >> >> -- >> >> Imesh Gunaratne >> >> >> >> Senior Technical Lead, WSO2 >> >> Committer & PMC Member, Apache Stratos >> >> >> >> >> >> -- >> >> Imesh Gunaratne >> >> >> >> Senior Technical Lead, WSO2 >> >> Committer & PMC Member, Apache Stratos >> >> >> >> >> >> -- >> >> Imesh Gunaratne >> >> >> >> Senior Technical Lead, WSO2 >> >> Committer & PMC Member, Apache Stratos >> >> >> >> >> >> -- >> >> -- >> Lahiru Sandaruwan >> >> Committer and PMC member, Apache Stratos, >> Senior Software Engineer, >> WSO2 Inc., http://wso2.com >> >> lean.enterprise.middleware >> >> phone: +94773325954 >> email: lahi...@wso2.com blog: http://lahiruwrites.blogspot.com/ >> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146 >> >> >> > > > > -- > Imesh Gunaratne > > Senior Technical Lead, WSO2 > Committer & PMC Member, Apache Stratos > -- Reka Thirunavukkarasu Senior Software Engineer, WSO2, Inc.:http://wso2.com, Mobile: +94776442007