Re: [ClusterLabs] (Live) Migration failure results in a stop operation
On 2018-02-20 12:07 AM, Digimer wrote: > Hi all, > > Is there a way to tell pacemaker that, if a migration operation fails, > to just leave the service on the host node? The service being hosted is > a VM and a migration failure that triggers a shut down and reboot is > very disruptive. I'd rather just leave it alone (and let a human fix the > underlying problem). > > Thanks! > I should mention; I tried setting the 'on-fail' for the 'migate_to' and 'migrate_from' operations; pcs resource create srv01-c7 ocf:alteeve:server name="srv01-c7" \ meta allow-migrate="true" op monitor interval="60" \ op stop on-fail="block" op migrate_to on-fail="ignore" \ op migrate_from on-fail="ignore" \ meta allow-migrate="true" failure-timeout="75" [root@m3-a02n01 ~]# pcs config Cluster Name: m3-anvil-02 Corosync Nodes: m3-a02n01.alteeve.com m3-a02n02.alteeve.com Pacemaker Nodes: m3-a02n01.alteeve.com m3-a02n02.alteeve.com Resources: Clone: hypervisor-clone Meta Attrs: clone-max=2 notify=false Resource: hypervisor (class=systemd type=libvirtd) Operations: monitor interval=60 (hypervisor-monitor-interval-60) start interval=0s timeout=100 (hypervisor-start-interval-0s) stop interval=0s timeout=100 (hypervisor-stop-interval-0s) Resource: srv01-c7 (class=ocf provider=alteeve type=server) Attributes: name=srv01-c7 Meta Attrs: allow-migrate=true failure-timeout=75 Operations: migrate_from interval=0s on-fail=ignore (srv01-c7-migrate_from-interval-0s) migrate_to interval=0s on-fail=ignore (srv01-c7-migrate_to-interval-0s) monitor interval=60 (srv01-c7-monitor-interval-60) start interval=0s timeout=30 (srv01-c7-start-interval-0s) stop interval=0s on-fail=block (srv01-c7-stop-interval-0s) Stonith Devices: Resource: virsh_node1 (class=stonith type=fence_virsh) Attributes: delay=15 ipaddr=10.255.255.250 login=root passwd="secret" pcmk_host_list=m3-a02n01.alteeve.com port=m3-a02n01 Operations: monitor interval=60 (virsh_node1-monitor-interval-60) Resource: virsh_node2 (class=stonith type=fence_virsh) Attributes: ipaddr=10.255.255.250 login=root passwd="secret" pcmk_host_list=m3-a02n02.alteeve.com port=m3-a02n02 Operations: monitor interval=60 (virsh_node2-monitor-interval-60) Fencing Levels: Location Constraints: Resource: srv01-c7 Enabled on: m3-a02n02.alteeve.com (score:50) (id:location-srv01-c7-m3-a02n02.alteeve.com-50) Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: m3-anvil-02 dc-version: 1.1.16-12.el7_4.7-94ff4df have-watchdog: false last-lrm-refresh: 1518584295 Quorum: Options: When I tried to migrate (with the RA set to fail on purpose), I got: Node 1 Feb 20 07:06:40 m3-a02n01.alteeve.com crmd[1865]: notice: Result of migrate_to operation for srv01-c7 on m3-a02n01.alteeve.com: 1 (unknown error) Feb 20 07:06:40 m3-a02n01.alteeve.com ocf:alteeve:server[3440]: 167; ocf:alteeve:server invoked. Feb 20 07:06:40 m3-a02n01.alteeve.com ocf:alteeve:server[3442]: 1360; Command line switch: [stop] -> [#!SET!#] Node 2 Feb 20 07:05:37 m3-a02n02.alteeve.com crmd[2394]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Feb 20 07:06:33 m3-a02n02.alteeve.com crmd[2394]: notice: State transition S_IDLE -> S_POLICY_ENGINE Feb 20 07:06:33 m3-a02n02.alteeve.com pengine[2393]: notice: * Migratesrv01-c7( m3-a02n01.alteeve.com -> m3-a02n02.alteeve.com ) Feb 20 07:06:33 m3-a02n02.alteeve.com pengine[2393]: notice: Calculated transition 756, saving inputs in /var/lib/pacemaker/pengine/pe-input-172.bz2 Feb 20 07:06:33 m3-a02n02.alteeve.com crmd[2394]: notice: Initiating migrate_to operation srv01-c7_migrate_to_0 on m3-a02n01.alteeve.com Feb 20 07:06:34 m3-a02n02.alteeve.com crmd[2394]: warning: Action 22 (srv01-c7_migrate_to_0) on m3-a02n01.alteeve.com failed (target: 0 vs. rc: 1): Error Feb 20 07:06:34 m3-a02n02.alteeve.com crmd[2394]: warning: Action 22 (srv01-c7_migrate_to_0) on m3-a02n01.alteeve.com failed (target: 0 vs. rc: 1): Error Feb 20 07:06:34 m3-a02n02.alteeve.com crmd[2394]: notice: Initiating migrate_from operation srv01-c7_migrate_from_0 locally on m3-a02n02.alteeve.com Feb 20 07:06:34 m3-a02n02.alteeve.com ocf:alteeve:server[3396]: 167; ocf:alteeve:server invoked. Feb 20 07:06:34 m3-a02n02.alteeve.com ocf:alteeve:server[3398]: 1360; Command line switch: [migrate_from] -> [#!SET!#] Feb 20 07:06:34 m3-a02n02.alteeve.com crmd[2394]: notice: Result of migrate_from operation for srv01-c7 on m3-a02n02.alteeve.com: 1 (unknown error) Feb 20 07:06:34 m3-a02n02.alteeve.com crmd[2394]: warning: Action 23 (srv01-c7_migrate_from_0) on m3-a02n02.alteeve.com failed (target: 0 vs. rc: 1): Error Feb 20 07:06:34 m3-a02n02.alteeve.com crmd[2394]: warning:
[ClusterLabs] (Live) Migration failure results in a stop operation
Hi all, Is there a way to tell pacemaker that, if a migration operation fails, to just leave the service on the host node? The service being hosted is a VM and a migration failure that triggers a shut down and reboot is very disruptive. I'd rather just leave it alone (and let a human fix the underlying problem). Thanks! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Monitor being called repeatedly for Master/Slave resource despite monitor failure
Hi, I have configured wildfly resource in master slave mode on a 6 VM cluster with stonith disabled and and no quorum policy set to ignore. We are observing that on either of master or slave resource failure, pacemaker keeps on calling stateful_monitor for wildfly repeatedly, despite us returning appropriate failure return codes on monitor failure for both master (rc=OCF_MASTER_FAILED) and slave (rc=OCF_NOT_RUNNING). This continues till failure-timeout is reached after which the resource gets demoted and stopped in case of master monitor failure and stopped in case of slave monitor failure. # pacemakerd --version Pacemaker 1.1.16 Written by Andrew Beekhof # corosync -v Corosync Cluster Engine, version '2.4.2' Copyright (c) 2006-2009 Red Hat, Inc. Below is my configuration: node 1: VM-0 node 2: VM-1 node 3: VM-2 node 4: VM-3 node 5: VM-4 node 6: VM-5 primitive stateful_wildfly ocf:pacemaker:wildfly \ op start timeout=200s interval=0 \ op promote timeout=300s interval=0 \ op monitor interval=90s role=Master timeout=90s \ op monitor interval=80s role=Slave timeout=100s \ meta resource-stickiness=100 migration-threshold=3 failure-timeout=240s ms wildfly_MS stateful_wildfly \ location stateful_wildfly_rule_2 wildfly_MS \ rule -inf: #uname eq VM-2 location stateful_wildfly_rule_3 wildfly_MS \ rule -inf: #uname eq VM-3 location stateful_wildfly_rule_4 wildfly_MS \ rule -inf: #uname eq VM-4 location stateful_wildfly_rule_5 wildfly_MS \ rule -inf: #uname eq VM-5 property cib-bootstrap-options: \ stonith-enabled=false \ no-quorum-policy=ignore \ cluster-recheck-interval=30s \ start-failure-is-fatal=false \ stop-all-resources=false \ have-watchdog=false \ dc-version=1.1.16-94ff4df51a \ cluster-infrastructure=corosync \ cluster-name=hacluster-0 Could you please help us in understanding this behavior and how to fix this? Thanks! Samarth J ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Monitor being called repeatedly for Master/Slave resource despite monitor returning failure
On Mon, 2018-02-19 at 16:48 +0530, Pankaj wrote: > Hi, > > > I have configured wildfly resource in master slave mode on a 6 VM > cluster with stonith disabled and and no quorum policy set to ignore. To some of us that sounds like "I'm driving a car with no brakes ..." :-) Without stonith or quorum, there's a high risk of split-brain. Any node that gets cut off from the others will start all the resources. > We are observing that on either of master or slave resource failure, > pacemaker keeps on calling stateful_monitor for wildfly repeatedly, > despite us returning appropriate failure return codes on monitor > failure for both master (failure rc=OCF_MASTER_FAILED) and slave > (failure rc=OCF_NOT_RUNNING). With your configuration, after the first monitor failure, it should try to stop the resource, start it again, then monitor it. One of the nodes at any time is elected the DC. This node will run the policy engine to make decisions about what needs to be done. The logs from that node will be most helpful. Look for the time the failure occurred; once the cluster detects the failure, there should be a bunch of lines from "pengine" ending in "Calculated transition" -- these will show what actions were decided. After that, there will be lines from "crmd" showing "Initiating" and "Result of" those actions. > This continues till failure-timeout is reached after which the > resource gets demoted and stopped in case of master monitor failure, > and stopped in case of slave monitor failure. > > Could you please help me understand: > Why don't pacemaker demotes or stops resource immediately after first > failure, and keeps calling monitor ? > > # pacemakerd --version > Pacemaker 1.1.16 > Written by Andrew Beekhof > > # corosync -v > Corosync Cluster Engine, version '2.4.2' > Copyright (c) 2006-2009 Red Hat, Inc. > > Below is my configuration: > > node 1: VM-0 > node 2: VM-1 > node 3: VM-2 > node 4: VM-3 > node 5: VM-4 > node 6: VM-5 > primitive stateful_wildfly ocf:pacemaker:wildfly \ > op start timeout=200s interval=0 \ > op promote timeout=300s interval=0 \ > op monitor interval=90s role=Master timeout=90s \ > op monitor interval=80s role=Slave timeout=100s \ > meta resource-stickiness=100 migration-threshold=3 failure- > timeout=240s > ms wildfly_MS stateful_wildfly \ > location stateful_wildfly_rule_2 wildfly_MS \ > rule -inf: #uname eq VM-2 > location stateful_wildfly_rule_3 wildfly_MS \ > rule -inf: #uname eq VM-3 > location stateful_wildfly_rule_4 wildfly_MS \ > rule -inf: #uname eq VM-4 > location stateful_wildfly_rule_5 wildfly_MS \ > rule -inf: #uname eq VM-5 > property cib-bootstrap-options: \ > stonith-enabled=false \ > no-quorum-policy=ignore \ > cluster-recheck-interval=30s \ > start-failure-is-fatal=false \ > stop-all-resources=false \ > have-watchdog=false \ > dc-version=1.1.16-94ff4df51a \ > cluster-infrastructure=corosync \ > cluster-name=hacluster-0 > > Could you please help us in understanding this behavior and how to > fix this? > > Regards, > Pankaj -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 2.0.0-rc1 now available
On Mon, 2018-02-19 at 10:23 +0100, Ulrich Windl wrote: > > > > Ken Gaillotschrieb am 16.02.2018 um > > > > 22:06 in Nachricht > > <1518815166.31176.22.ca...@redhat.com>: > [...] > > It is recommended to run "cibadmin --upgrade" (or the equivalent in > > your higher-level tool of choice) both before and after the > > upgrade. > > [...] > Playing with it (older version), I found two possible improvements. > Consider: > > h01:~ # cibadmin --upgrade > The supplied command is considered dangerous. To prevent accidental > destruction of the cluster, the --force flag is required in order to > proceed. > h01:~ # cibadmin --upgrade --force > Call cib_upgrade failed (-211): Schema is already the latest > available > > First, cibadmin should check whether the CIB version is up-to-date > already. If so, there is no need to insist on using --force, and > secondly if the CIB is already up-to-date, there should not be a > failure, but a success. Good point, I'll change it so it prints the following message and exits 0 in such a case: Upgrade unnecessary: Schema is already the latest available Avoiding the need for --force in such a case is a bigger project and will have to go on the to-do list. (With the current design, we can't know it's not needed until after we try to upgrade it.) > If a status is needed to detect and out-of-date CIB, a different > option would be the better solution IMHO. > > Regards, > Ulrich -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 2.0.0-rc1 now available
On 19/02/18 10:39 +0100, Ulrich Windl wrote: Ken Gaillotschrieb am 16.02.2018 um 22:06 in Nachricht > <1518815166.31176.22.ca...@redhat.com>: > [...] >> * The master XML tag is deprecated (though still supported) in favor of > > XML guys! > > Everybody is using (and liking?) XML, but please also learn the > correct names: There are no "tags" in XML (unless we talk about the > syntax of XML, which is not the case here), only "elements" and > "attributes". > > "master" element True, but perhaps no need to be so harsh about the terms widely understood as synonyms (from pre-XML era, remember HTML tutorials?), especially when the target audience wrt. XML level of configuration keeps shifting towards real power (as in maximum-level-of-control pursuing) users vs. wider audience served with more abstract means thanks to crm/pcs. In anything, would expect rather pointing out the "CIB version 1.3+ does support 'tag' elements" paradox ;-) >> using the standard clone tag with a new "promotable" meta-attribute set > > "clone" element > > ``"promotable" meta-attribute'' --> ``"nvpair" element with name attribute > "promotable"'' > >> to true. The "master-max" and "master-node-max" master meta-attributes >> are deprecated in favor of new "promoted-max" and "promoted-node-max" >> clone meta-attributes. Documentation now refers to these as promotable >> clones rather than master/slave, stateful or multistate clones. >> >> * The record-pending option now defaults to true, which means pending >> actions will be shown in status displays. >> >> * Three minor regressions introduced in 1.1.18, and one introduced in >> 1.1.17, have been fixed. > > I know what you are talking about, but you need quite some > background to find out that in "master XML tag" the emphasis is not > on XML (even if in capitals), but on master ;-) I see, this shakes the message delivery, something I am running into all the time when proof-reading my own sentences (especially since I am not a native speaker), and employing various tricks such as the basic quoted "meta escape" or join-by-dashes one. Happy Monday -- Poki pgpnAhmI8JRgx.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Issues with DB2 HADR Resource Agent
Hello Ondrej, I am still having issues with my DB2 HADR on Pacemaker. When I do a db2_kill on Primary for testing, initially it does a restart of DB2 on the same node. But if I let it run for some days and then try the same test, it goes into fencing and then reboots the Primary Node. I am not sure how exactly it should behave in case my DB2 crashes on Primary. Also if I crash the Node 1 (the node itself, not only DB2), it promotes Node 2 to Primary, but once the Pacemaker is started again on Node 1, the DB on Node 1 is also promoted to Primary. Is that expected behaviour ? Regards, Dileep V Nair Senior AIX Administrator Cloud Managed Services Delivery (MSD), India IBM Cloud E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya Bangalore, KA 560045 India From: Ondrej FameraTo: Dileep V Nair Cc: Cluster Labs - All topics related to open-source clustering welcomed Date: 02/12/2018 11:46 AM Subject:Re: [ClusterLabs] Issues with DB2 HADR Resource Agent On 02/01/2018 07:24 PM, Dileep V Nair wrote: > Thanks Ondrej for the response. I have set the PEER_WINDOWto 1000 which > I guess is a reasonable value. What I am noticing is it does not wait > for the PEER_WINDOW. Before that itself the DB goes into a > REMOTE_CATCHUP_PENDING state and Pacemaker give an Error saying a DB in > STANDBY/REMOTE_CATCHUP_PENDING/DISCONNECTED can never be promoted. > > > Regards, > > *Dileep V Nair* Hi Dileep, sorry for later response. The DB2 should not get into the 'REMOTE_CATCHUP' phase or the DB2 resource agent will indeed not promote. From my experience it usually gets into that state when the DB2 on standby was restarted during or after PEER_WINDOW timeout. When the primary DB2 fails then standby should end up in some state that would match the one on line 770 of DB2 resource agent and the promote operation is attempted. 770 STANDBY/*PEER/DISCONNECTED|Standby/DisconnectedPeer) https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ClusterLabs_resource-2Dagents_blob_master_heartbeat_db2-23L770=DwIDBA=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=dhvUwjWghTBfDEHmzU3P5eaU9Ce3DkCRdRPNd71L1bU=3vPiNA4KGdZzc0xJOYv5hMCObjWdlxZDO_bLb86YaGM= The DB2 on standby can get restarted when the 'promote' operation times out, so you can try increasing the 'promote' timeout to something higher if this was the case. So if you see that DB2 was restarted after Primary failed, increase the promote timeout. If DB2 was not restarted then question is why DB2 has decided to change the status in this way. Let me know if above helped. -- Ondrej Faměra @Red Hat ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Monitor being called repeatedly for Master/Slave resource despite monitor returning failure
>>> Pankajschrieb am 19.02.2018 um 12:18 in Nachricht : > Hi, > > > I have configured wildfly resource in master slave mode on a 6 VM cluster > with stonith disabled and and no quorum policy set to ignore. > > We are observing that on either of master or slave resource failure, > pacemaker keeps on calling stateful_monitor for wildfly repeatedly, despite > us returning appropriate failure return codes on monitor failure for both > master (failure rc=OCF_MASTER_FAILED) and slave (failure > rc=OCF_NOT_RUNNING). > > This continues till failure-timeout is reached after which the resource > gets demoted and stopped in case of master monitor failure, and stopped in > case of slave monitor failure. > > Could you please help me understand: > Why don't pacemaker demotes or stops resource immediately after first > failure, and keeps calling monitor ? > > # pacemakerd --version > Pacemaker 1.1.16 > Written by Andrew Beekhof > > # corosync -v > Corosync Cluster Engine, version '2.4.2' > Copyright (c) 2006-2009 Red Hat, Inc. > > Below is my configuration: > > node 1: VM-0 > node 2: VM-1 > node 3: VM-2 > node 4: VM-3 > node 5: VM-4 > node 6: VM-5 > primitive stateful_wildfly ocf:pacemaker:wildfly \ > op start timeout=200s interval=0 \ > op promote timeout=300s interval=0 \ > op monitor interval=90s role=Master timeout=90s \ > op monitor interval=80s role=Slave timeout=100s \ > meta resource-stickiness=100 migration-threshold=3 > failure-timeout=240s > ms wildfly_MS stateful_wildfly \ > location stateful_wildfly_rule_2 wildfly_MS \ > rule -inf: #uname eq VM-2 > location stateful_wildfly_rule_3 wildfly_MS \ > rule -inf: #uname eq VM-3 > location stateful_wildfly_rule_4 wildfly_MS \ > rule -inf: #uname eq VM-4 > location stateful_wildfly_rule_5 wildfly_MS \ > rule -inf: #uname eq VM-5 > property cib-bootstrap-options: \ > stonith-enabled=false \ > no-quorum-policy=ignore \ > cluster-recheck-interval=30s \ > start-failure-is-fatal=false \ > stop-all-resources=false \ > have-watchdog=false \ > dc-version=1.1.16-94ff4df51a \ > cluster-infrastructure=corosync \ > cluster-name=hacluster-0 > > Could you please help us in understanding this behavior and how to fix this? What does the cluster log? > > Regards, > Pankaj ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Monitor being called repeatedly for Master/Slave resource despite monitor returning failure
Hi, I have configured wildfly resource in master slave mode on a 6 VM cluster with stonith disabled and and no quorum policy set to ignore. We are observing that on either of master or slave resource failure, pacemaker keeps on calling stateful_monitor for wildfly repeatedly, despite us returning appropriate failure return codes on monitor failure for both master (failure rc=OCF_MASTER_FAILED) and slave (failure rc=OCF_NOT_RUNNING). This continues till failure-timeout is reached after which the resource gets demoted and stopped in case of master monitor failure, and stopped in case of slave monitor failure. Could you please help me understand: Why don't pacemaker demotes or stops resource immediately after first failure, and keeps calling monitor ? # pacemakerd --version Pacemaker 1.1.16 Written by Andrew Beekhof # corosync -v Corosync Cluster Engine, version '2.4.2' Copyright (c) 2006-2009 Red Hat, Inc. Below is my configuration: node 1: VM-0 node 2: VM-1 node 3: VM-2 node 4: VM-3 node 5: VM-4 node 6: VM-5 primitive stateful_wildfly ocf:pacemaker:wildfly \ op start timeout=200s interval=0 \ op promote timeout=300s interval=0 \ op monitor interval=90s role=Master timeout=90s \ op monitor interval=80s role=Slave timeout=100s \ meta resource-stickiness=100 migration-threshold=3 failure-timeout=240s ms wildfly_MS stateful_wildfly \ location stateful_wildfly_rule_2 wildfly_MS \ rule -inf: #uname eq VM-2 location stateful_wildfly_rule_3 wildfly_MS \ rule -inf: #uname eq VM-3 location stateful_wildfly_rule_4 wildfly_MS \ rule -inf: #uname eq VM-4 location stateful_wildfly_rule_5 wildfly_MS \ rule -inf: #uname eq VM-5 property cib-bootstrap-options: \ stonith-enabled=false \ no-quorum-policy=ignore \ cluster-recheck-interval=30s \ start-failure-is-fatal=false \ stop-all-resources=false \ have-watchdog=false \ dc-version=1.1.16-94ff4df51a \ cluster-infrastructure=corosync \ cluster-name=hacluster-0 Could you please help us in understanding this behavior and how to fix this? Regards, Pankaj ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Pacemaker 2.0.0-rc1 now available
>>> Ken Gaillotschrieb am 16.02.2018 um 22:06 in >>> Nachricht <1518815166.31176.22.ca...@redhat.com>: [...] > * The master XML tag is deprecated (though still supported) in favor of XML guys! Everybody is using (and liking?) XML, but please also learn the correct names: There are no "tags" in XML (unless we talk about the syntax of XML, which is not the case here), only "elements" and "attributes". "master" element > using the standard clone tag with a new "promotable" meta-attribute set "clone" element ``"promotable" meta-attribute'' --> ``"nvpair" element with name attribute "promotable"'' > to true. The "master-max" and "master-node-max" master meta-attributes > are deprecated in favor of new "promoted-max" and "promoted-node-max" > clone meta-attributes. Documentation now refers to these as promotable > clones rather than master/slave, stateful or multistate clones. > > * The record-pending option now defaults to true, which means pending > actions will be shown in status displays. > > * Three minor regressions introduced in 1.1.18, and one introduced in > 1.1.17, have been fixed. I know what you are talking about, but you need quite some background to find out that in "master XML tag" the emphasis is not on XML (even if in capitals), but on master ;-) Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Pacemaker 2.0.0-rc1 now available
>>> Ken Gaillotschrieb am 16.02.2018 um 22:06 in >>> Nachricht <1518815166.31176.22.ca...@redhat.com>: [...] > It is recommended to run "cibadmin --upgrade" (or the equivalent in > your higher-level tool of choice) both before and after the upgrade. [...] Playing with it (older version), I found two possible improvements. Consider: h01:~ # cibadmin --upgrade The supplied command is considered dangerous. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. h01:~ # cibadmin --upgrade --force Call cib_upgrade failed (-211): Schema is already the latest available First, cibadmin should check whether the CIB version is up-to-date already. If so, there is no need to insist on using --force, and secondly if the CIB is already up-to-date, there should not be a failure, but a success. If a status is needed to detect and out-of-date CIB, a different option would be the better solution IMHO. Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 2.0.0-rc1 now available
Ken Gaillot napsal(a): On Fri, 2018-02-16 at 15:06 -0600, Ken Gaillot wrote: Source code for the first release candidate for Pacemaker version 2.0.0 is now available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0 .0 -rc1 The main goal of the change from Pacemaker 1 to 2 is to drop support for deprecated legacy usage, in order to make the code base more maintainable going into the future. As such, this release involves a net drop of more than 20,000 lines of code! Rolling (live) upgrades are possible only from Pacemaker 1.1.11 or later, on top of corosync 2. Other setups can be upgraded with the cluster stopped. It is recommended to run "cibadmin --upgrade" (or the equivalent in your higher-level tool of choice) both before and after the upgrade. The final 2.0.0 release will automatically transform most of the dropped older syntax to the newer form. However, this functionality is not yet complete in rc1. The most significant changes in this release include: * Support has been dropped for heartbeat and corosync 1 (whether using CMAN or plugin), and many legacy aliases for cluster options (including default-resource-stickiness, which should be set as resource- stickiness in rsc_defaults instead). * The default location of the Pacemaker detail log is now /var/log/pacemaker/pacemaker.log, and Pacemaker will no longer use Corosync's logging preferences. Options are available in the configure script to change the default log locations. Thank you a lot! * The master XML tag is deprecated (though still supported) in favor of using the standard clone tag with a new "promotable" meta-attribute set to true. The "master-max" and "master-node-max" master meta- attributes are deprecated in favor of new "promoted-max" and "promoted-node-max" clone meta-attributes. Documentation now refers to these as promotable clones rather than master/slave, stateful or multistate clones. * The record-pending option now defaults to true, which means pending actions will be shown in status displays. * Three minor regressions introduced in 1.1.18, and one introduced in 1.1.17, have been fixed. More details are available in the change log: https://github.com/ClusterLabs/pacemaker/blob/2.0/ChangeLog and in a special wiki page for the 2.0 release: https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes Everyone is encouraged to download, compile and test the new release. We do many regression tests and simulations, but we can't cover all possible use cases, so your feedback is important and appreciated. Many thanks to all contributors of source code to this release, including Whoops, hit send too soon :) Andrew Beekhof, Bin Liu, Gao,Yan, Jan Pokorný, and Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org