[ClusterLabs] Coming in Pacemaker 2.0.5: finer control over resource and operation defaults

2020-07-23 Thread Ken Gaillot
Hi all,

Pacemaker 2.0.4 is barely out the door, and we're already looking ahead
to 2.0.5, expected at the end of this year.

One of the new features, already available in the master branch, will
be finer-grained control over resource and operation defaults.

Currently, you can set meta-attribute values in the CIB's rsc_defaults
section to apply to all resources, and op_defaults to apply to all
operations. Rules can be used to apply defaults only during certain
times. For example, to set a default stickiness of INFINITY during
business hours and 0 outside those hours:

   

  

  

  
  


  

   

But what if you want to change the default stickiness of just pgsql
databases? Or the default timeout of only start operations?

2.0.5 will add new rule expressions for this purpose. Examples:

   

  

  
  

   

   

  

  
  

   

You can combine rsc_expression and op_expression in op_defaults rules,
if for example you want to set a default stop timeout for all
ocf:heartbeat:docker resources.

This obviously can be convenient if you have many resources of the same
type, but it has one other trick up its sleeve: this is the only way
you can affect the meta-attributes of resources implicitly created by
Pacemaker for bundles.

When you configure a bundle, Pacemaker will implicitly create container
resources (ocf:heartbeat:docker, ocf:heartbeat:rkt, or
ocf:heartbeat:podman) and if appropriate, IP resources
(ocf:heartbeat:IPaddr2). Previously, there was no way to directly
affect these resources, but with these new expressions you can at least
configure defaults that apply to them, without having to use those same
defaults for all your resources.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker systemd resource

2020-07-23 Thread Хиль Эдуард


Thx Andrei, and to all of you guys for your time, i appreciate that!

Yeah, it’s very sad to see that. Looks like a bug described here:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1869751
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1881762
Well, for me no other way, but change OS from ubuntu to something else, cuz i 
am very disappointed there are so critical bugs :(
  
>Среда, 22 июля 2020, 22:57 +05:00 от Andrei Borzenkov :
> 
>22.07.2020 12:46, Хиль Эдуард пишет:
>>
>> Hey, Andrei! Thanx for ur time!
>> A-a-and there is no chance to do something? :( 
>> The pacemaker’s log below.
>>  
>
>Resource was started:
>
>...
>> Jul 22 12:38:36 node2.local pacemaker-execd     [1721] (log_execute)     
>> info: executing - rsc:dummy.service action:start call_id:76
>> Jul 22 12:38:36 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: Diff: --- 0.131.4 2
>> Jul 22 12:38:36 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: Diff: +++ 0.131.5 (null)
>> Jul 22 12:38:36 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: +  /cib:  @num_updates=5
>> Jul 22 12:38:36 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: +  
>> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='dummy.service']/lrm_rsc_op[@id='dummy.service_last_0']:
>>  @operation_key=dummy.service_start_0, @operation=start, 
>> @transition-key=164:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a, 
>> @transition-magic=-1:193;164:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a, 
>> @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1595410716, 
>> @last-run=1595410716, @e
>> Jul 22 12:38:36 node2.local pacemaker-based     [1719] (cib_process_request) 
>>     info: Completed cib_modify operation for section status: OK (rc=0, 
>> origin=node2.local/crmd/62, version=0.131.5)
>> Jul 22 12:38:36 node2.local pacemaker-execd     [1721] (systemd_exec_result) 
>>     info: Call to start passed: /org/freedesktop/systemd1/job/703
>> Jul 22 12:38:38 node2.local pacemaker-controld  [1724] (process_lrm_event)   
>>   notice: Result of start operation for dummy.service on node2.local: 0 (ok) 
>> | call=76 key=dummy.service_start_0 confirmed=true cib-update=63
>
>So start operation at least was successfully completed.
>
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_process_request) 
>>     info: Forwarding cib_modify operation for section status to all 
>> (origin=local/crmd/63)
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: Diff: --- 0.131.5 2
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: Diff: +++ 0.131.6 (null)
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: +  /cib:  @num_updates=6
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: +  
>> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='dummy.service']/lrm_rsc_op[@id='dummy.service_last_0']:
>>   @transition-magic=0:0;164:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a, 
>> @call-id=76, @rc-code=0, @op-status=0, @last-rc-change=1986, @last-run=1986, 
>> @exec-time=-587720, @queue-time=59
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_process_request) 
>>     info: Completed cib_modify operation for section status: OK (rc=0, 
>> origin=node2.local/crmd/63, version=0.131.6)
>> Jul 22 12:38:38 node2.local pacemaker-controld  [1724] (do_lrm_rsc_op)     
>> info: Performing key=165:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a 
>> op=dummy.service_monitor_6
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_process_request) 
>>     info: Forwarding cib_modify operation for section status to all 
>> (origin=local/crmd/64)
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: Diff: --- 0.131.6 2
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: Diff: +++ 0.131.7 (null)
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: +  /cib:  @num_updates=7
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_perform_op)     
>> info: ++ 
>> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='dummy.service']:
>>   > operation_key="dummy.service_monitor_6" operation="monitor" 
>> crm-debug-origin="do_update_resource" crm_feature_set="3.2.0" 
>> transition-key="165:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a" 
>> transition-magic="-1:193;165:23:0:76f4932e-716b-45b8-8fed-a20c3806df8a" 
>> exit-reason="" on_
>> Jul 22 12:38:38 node2.local pacemaker-based     [1719] (cib_process_request) 
>>     info: Completed cib_modify operation for section status: OK (rc=0, 
>> origin=node2.local/crmd/64, version=0.131.7)
>> Jul 22 12:38:38 node2.local pacemaker-controld  [1724] (process_lrm_event)   
>>   notice: Result of monitor operation for dummy.service on 

[ClusterLabs] Antw: [EXT] Re: Pacemaker Shutdown

2020-07-23 Thread Ulrich Windl
>>> Harvey Shepherd  schrieb am 22.07.2020 um
23:43
in Nachricht
:
> Thanks for your response Reid. What you say makes sense, and under normal 
> circumstances if a resource failed, I'd want all of its dependents to be 
> stopped cleanly before restarting the failed resource. However if pacemaker

> is shutting down on a node (e.g. due to a restart request), then I just want

> to failover as fast as possible, so an unclean kill is fine. At the moment 
> the shutdown process is taking 2 mins. I was just wondering if there was a 
> way to do this.

Hi!

I think you are mixing two concepts: A shutdown request is the attempt to stop
things cleanly all the time, while a node failure (which will be followed by a
fencing opration) definitely will be unable to do a clean shutdown as the node
is considered to be dead already.
Also remember that even STONITH (fencing will take some time), and maybe
generally it's better to try a stop with timeout (which will fence THEN if the
timeout expired).

And of course: HA software is not to make any stop operation faster ;-)

Regards,
Ulrich

> 
> Regards,
> Harvey
> 
> 
> From: Users  on behalf of Reid Wahl 
> 
> Sent: 23 July 2020 08:05
> To: Cluster Labs ‑ All topics related to open‑source clustering welcomed 
> 
> Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown
> 
> 
> On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd 
> mailto:harvey.sheph...@aviatnet.com>> wrote:
> Hi All,
> 
> I'm running Pacemaker 2.0.3 on a two‑node cluster, controlling 40+ resources

> which are a mixture of clones and other resources that are colocated with
the 
> master instance of certain clones. I've noticed that if I terminate
pacemaker 
> on the node that is hosting the master instances of the clones, Pacemaker 
> focuses on stopping resources on that node BEFORE failing over to the other

> node, leading to a longer outage than necessary. Is there a way to change 
> this behaviour?
> 
> Hi, Harvey.
> 
> As you likely know, a given resource active/passive resource will have to 
> stop on one node before it can start on another node, and the same goes for
a 
> promoted clone instance having to demote on one node before it can promote
on 
> another. There are exceptions for clone instances and for promotable clones

> with promoted‑max > 1 ("allow more than one master instance"). A resource 
> that's configured to run on one node at a time should not try to run on two

> nodes during failover.
> 
> With that in mind, what exactly are you wanting to happen? Is the problem 
> that all resources are stopping on node 1 before any of them start on node
2? 
> Or that you want Pacemaker shutdown to kill the processes on node 1 instead

> of cleanly shutting them down? Or something different?
> 
> These are the actions and logs I saw during the test:
> 
> Ack. This seems like it's just telling us that Pacemaker is going through a

> graceful shutdown. The info more relevant to the resource stop/start order 
> would be in /var/log/pacemaker/pacemaker.log (or less detailed in 
> /var/log/messages) on the DC.
> 
> # /etc/init.d/pacemaker stop
> Signaling Pacemaker Cluster Manager to terminate
> 
> Waiting for cluster services to 
> unload..sending

> signal 9 to procs
> 
> 
> 2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling

> Pacemaker Cluster Manager to terminate
> 2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting 
> for cluster services to unload
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: new_event_notification (6140‑6141‑9): Broken pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: Notification of client
stonithd/665bde82‑cb28‑40f7‑9132‑8321dc2f1992 
> failed
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: new_event_notification (6140‑6143‑8): Broken pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: Notification of client attrd/a26ca273‑3422‑4ebe‑8cb7‑95849b8ff130

> failed
> 2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740 
> pacemaker‑schedulerd.6240  warning: Blind faith: not fencing unseen nodes
> 2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is

> inactive (3).
> 
> Regards,
> Harvey
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> ‑‑
> Regards,
> 
> Reid Wahl, RHCA
> Software Maintenance Engineer, Red Hat
> CEE ‑ Platform Support Delivery ‑ ClusterHA



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/