[ClusterLabs] Coming in Pacemaker 2.0.5: finer control over resource and operation defaults
Hi all, Pacemaker 2.0.4 is barely out the door, and we're already looking ahead to 2.0.5, expected at the end of this year. One of the new features, already available in the master branch, will be finer-grained control over resource and operation defaults. Currently, you can set meta-attribute values in the CIB's rsc_defaults section to apply to all resources, and op_defaults to apply to all operations. Rules can be used to apply defaults only during certain times. For example, to set a default stickiness of INFINITY during business hours and 0 outside those hours: But what if you want to change the default stickiness of just pgsql databases? Or the default timeout of only start operations? 2.0.5 will add new rule expressions for this purpose. Examples: You can combine rsc_expression and op_expression in op_defaults rules, if for example you want to set a default stop timeout for all ocf:heartbeat:docker resources. This obviously can be convenient if you have many resources of the same type, but it has one other trick up its sleeve: this is the only way you can affect the meta-attributes of resources implicitly created by Pacemaker for bundles. When you configure a bundle, Pacemaker will implicitly create container resources (ocf:heartbeat:docker, ocf:heartbeat:rkt, or ocf:heartbeat:podman) and if appropriate, IP resources (ocf:heartbeat:IPaddr2). Previously, there was no way to directly affect these resources, but with these new expressions you can at least configure defaults that apply to them, without having to use those same defaults for all your resources. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] pacemaker systemd resource
Thx Andrei, and to all of you guys for your time, i appreciate that! Yeah, it’s very sad to see that. Looks like a bug described here: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1869751 https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1881762 Well, for me no other way, but change OS from ubuntu to something else, cuz i am very disappointed there are so critical bugs :( >Среда, 22 июля 2020, 22:57 +05:00 от Andrei Borzenkov : > >22.07.2020 12:46, Хиль Эдуард пишет: >> >> Hey, Andrei! Thanx for ur time! >> A-a-and there is no chance to do something? :( >> The pacemaker’s log below. >> > >Resource was started: > >... >> Jul 22 12:38:36 node2.local pacemaker-execd [1721] (log_execute) >> info: executing - rsc:dummy.service action:start call_id:76 >> Jul 22 12:38:36 node2.local pacemaker-based [1719] (cib_perform_op) >> info: Diff: --- 0.131.4 2 >> Jul 22 12:38:36 node2.local pacemaker-based [1719] (cib_perform_op) >> info: Diff: +++ 0.131.5 (null) >> Jul 22 12:38:36 node2.local pacemaker-based [1719] (cib_perform_op) >> info: + /cib: @num_updates=5 >> Jul 22 12:38:36 node2.local pacemaker-based [1719] (cib_perform_op) >> info: + >> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='dummy.service']/lrm_rsc_op[@id='dummy.service_last_0']: >> @operation_key=dummy.service_start_0, @operation=start, >> @transition-key=164:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a, >> @transition-magic=-1:193;164:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a, >> @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1595410716, >> @last-run=1595410716, @e >> Jul 22 12:38:36 node2.local pacemaker-based [1719] (cib_process_request) >> info: Completed cib_modify operation for section status: OK (rc=0, >> origin=node2.local/crmd/62, version=0.131.5) >> Jul 22 12:38:36 node2.local pacemaker-execd [1721] (systemd_exec_result) >> info: Call to start passed: /org/freedesktop/systemd1/job/703 >> Jul 22 12:38:38 node2.local pacemaker-controld [1724] (process_lrm_event) >> notice: Result of start operation for dummy.service on node2.local: 0 (ok) >> | call=76 key=dummy.service_start_0 confirmed=true cib-update=63 > >So start operation at least was successfully completed. > >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_process_request) >> info: Forwarding cib_modify operation for section status to all >> (origin=local/crmd/63) >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: Diff: --- 0.131.5 2 >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: Diff: +++ 0.131.6 (null) >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: + /cib: @num_updates=6 >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: + >> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='dummy.service']/lrm_rsc_op[@id='dummy.service_last_0']: >> @transition-magic=0:0;164:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a, >> @call-id=76, @rc-code=0, @op-status=0, @last-rc-change=1986, @last-run=1986, >> @exec-time=-587720, @queue-time=59 >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_process_request) >> info: Completed cib_modify operation for section status: OK (rc=0, >> origin=node2.local/crmd/63, version=0.131.6) >> Jul 22 12:38:38 node2.local pacemaker-controld [1724] (do_lrm_rsc_op) >> info: Performing key=165:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a >> op=dummy.service_monitor_6 >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_process_request) >> info: Forwarding cib_modify operation for section status to all >> (origin=local/crmd/64) >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: Diff: --- 0.131.6 2 >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: Diff: +++ 0.131.7 (null) >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: + /cib: @num_updates=7 >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_perform_op) >> info: ++ >> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='dummy.service']: >> > operation_key="dummy.service_monitor_6" operation="monitor" >> crm-debug-origin="do_update_resource" crm_feature_set="3.2.0" >> transition-key="165:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a" >> transition-magic="-1:193;165:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a" >> exit-reason="" on_ >> Jul 22 12:38:38 node2.local pacemaker-based [1719] (cib_process_request) >> info: Completed cib_modify operation for section status: OK (rc=0, >> origin=node2.local/crmd/64, version=0.131.7) >> Jul 22 12:38:38 node2.local pacemaker-controld [1724] (process_lrm_event) >> notice: Result of monitor operation for dummy.service on
[ClusterLabs] Antw: [EXT] Re: Pacemaker Shutdown
>>> Harvey Shepherd schrieb am 22.07.2020 um 23:43 in Nachricht : > Thanks for your response Reid. What you say makes sense, and under normal > circumstances if a resource failed, I'd want all of its dependents to be > stopped cleanly before restarting the failed resource. However if pacemaker > is shutting down on a node (e.g. due to a restart request), then I just want > to failover as fast as possible, so an unclean kill is fine. At the moment > the shutdown process is taking 2 mins. I was just wondering if there was a > way to do this. Hi! I think you are mixing two concepts: A shutdown request is the attempt to stop things cleanly all the time, while a node failure (which will be followed by a fencing opration) definitely will be unable to do a clean shutdown as the node is considered to be dead already. Also remember that even STONITH (fencing will take some time), and maybe generally it's better to try a stop with timeout (which will fence THEN if the timeout expired). And of course: HA software is not to make any stop operation faster ;-) Regards, Ulrich > > Regards, > Harvey > > > From: Users on behalf of Reid Wahl > > Sent: 23 July 2020 08:05 > To: Cluster Labs ‑ All topics related to open‑source clustering welcomed > > Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown > > > On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd > mailto:harvey.sheph...@aviatnet.com>> wrote: > Hi All, > > I'm running Pacemaker 2.0.3 on a two‑node cluster, controlling 40+ resources > which are a mixture of clones and other resources that are colocated with the > master instance of certain clones. I've noticed that if I terminate pacemaker > on the node that is hosting the master instances of the clones, Pacemaker > focuses on stopping resources on that node BEFORE failing over to the other > node, leading to a longer outage than necessary. Is there a way to change > this behaviour? > > Hi, Harvey. > > As you likely know, a given resource active/passive resource will have to > stop on one node before it can start on another node, and the same goes for a > promoted clone instance having to demote on one node before it can promote on > another. There are exceptions for clone instances and for promotable clones > with promoted‑max > 1 ("allow more than one master instance"). A resource > that's configured to run on one node at a time should not try to run on two > nodes during failover. > > With that in mind, what exactly are you wanting to happen? Is the problem > that all resources are stopping on node 1 before any of them start on node 2? > Or that you want Pacemaker shutdown to kill the processes on node 1 instead > of cleanly shutting them down? Or something different? > > These are the actions and logs I saw during the test: > > Ack. This seems like it's just telling us that Pacemaker is going through a > graceful shutdown. The info more relevant to the resource stop/start order > would be in /var/log/pacemaker/pacemaker.log (or less detailed in > /var/log/messages) on the DC. > > # /etc/init.d/pacemaker stop > Signaling Pacemaker Cluster Manager to terminate > > Waiting for cluster services to > unload..sending > signal 9 to procs > > > 2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling > Pacemaker Cluster Manager to terminate > 2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting > for cluster services to unload > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: new_event_notification (6140‑6141‑9): Broken pipe (32) > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: Notification of client stonithd/665bde82‑cb28‑40f7‑9132‑8321dc2f1992 > failed > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: new_event_notification (6140‑6143‑8): Broken pipe (32) > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: Notification of client attrd/a26ca273‑3422‑4ebe‑8cb7‑95849b8ff130 > failed > 2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740 > pacemaker‑schedulerd.6240 warning: Blind faith: not fencing unseen nodes > 2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is > inactive (3). > > Regards, > Harvey > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > ‑‑ > Regards, > > Reid Wahl, RHCA > Software Maintenance Engineer, Red Hat > CEE ‑ Platform Support Delivery ‑ ClusterHA ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/