Re: [ClusterLabs] pacemaker startup problem

2020-07-24 Thread Ken Gaillot
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
> Hello,
>  
> after a long time I'm back to run heartbeat/pacemaker/corosync on our
> XStreamOS/illumos distro.
> I rebuilt the original components I did in 2016 on our latest release
> (probably a bit outdated, but I want to start from where I left).
> Looks like pacemaker is having trouble starting up showin this logs:
> 
> Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
> Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
> Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
> (e174ec8)
> Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
> state S_STARTING from crmd_init
> Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
> cluster type: 'heartbeat'
> Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
> active 'heartbeat' cluster
> Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
> Connecting to cluster infrastructure: heartbeat


> Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
> start lrmd IPC server: Operation not supported (-48)

This is repeated for all the subdaemons ... the error is coming from
qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type
for illumos. If you set it to "socket" it should work.


> Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server:
> shutting down and inhibiting respawn
> Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory
> from libxml2
> Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster
> type: 'heartbeat'
> Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an
> active 'heartbeat' cluster
> Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-
> system "pengine"
> Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry
> 25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1
> total)
> Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid
> d426a730-5229-6758-853a-99d4d491514a
> Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
> Hostname: xstorage1
> Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
> UUID: d426a730-5229-6758-853a-99d4d491514a
> Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting
> to cluster infrastructure: heartbeat
> Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could
> not start attrd IPC server: Operation not supported (-48)
> Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to
> create attrd servers: exiting and inhibiting respawn.
> Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify
> pacemaker and pacemaker_remote are not both enabled.
> Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could
> not start pengine IPC server: Operation not supported (-48)
> Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC
> server: shutting down and inhibiting respawn
> Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up
> memory from libxml2
> Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 1 times... pause and retry
> Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process
> pengine exited (pid=972, rc=100)
> Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 2 times... pause and retry
> Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 3 times... pause and retry
> Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 4 times... pause and retry
> Jul 24 

[ClusterLabs] pacemaker startup problem

2020-07-24 Thread Gabriele Bulfon
Hello,
 
after a long time I'm back to run heartbeat/pacemaker/corosync on our 
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release (probably 
a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active directory to 
/sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15 (e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in state 
S_STARTING from crmd_init
Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active directory to 
/sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active directory 
to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying cluster 
type: 'heartbeat'
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an active 
'heartbeat' cluster
Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect: Connecting to 
cluster infrastructure: heartbeat
Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not start 
lrmd IPC server: Operation not supported (-48)
Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server: shutting 
down and inhibiting respawn
Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory from 
libxml2
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster type: 
'heartbeat'
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an active 
'heartbeat' cluster
Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-system "pengine"
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry 
25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1 total)
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid 
d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: Hostname: 
xstorage1
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: UUID: 
d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting to cluster 
infrastructure: heartbeat
Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could not start 
attrd IPC server: Operation not supported (-48)
Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to create 
attrd servers: exiting and inhibiting respawn.
Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify pacemaker 
and pacemaker_remote are not both enabled.
Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active directory to 
/sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could not start 
pengine IPC server: Operation not supported (-48)
Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC server: 
shutting down and inhibiting respawn
Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up memory from 
libxml2
Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 1 times... pause and retry
Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process pengine 
exited (pid=972, rc=100)
Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just 
popped (2000ms)
Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 2 times... pause and retry
Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just 
popped (2000ms)
Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 3 times... pause and retry
Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just 
popped (2000ms)
Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 4 times... pause and retry
Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect to the 
CIB service: Transport endpoint is not connected (-134)
Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server: Could not 
start stonith-ng IPC server: Operation not supported (-48)
Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init: Failed to 
create stonith-ng servers: exiting and inhibiting respawn.
Jul 

[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.5: finer control over resource and operation defaults

2020-07-24 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 23.07.2020 um 23:54 in
Nachricht
<99c11c73d59560fccd472d09c3b76073dab1b73e.ca...@redhat.com>:
> Hi all,
> 
> Pacemaker 2.0.4 is barely out the door, and we're already looking ahead
> to 2.0.5, expected at the end of this year.
> 
> One of the new features, already available in the master branch, will
> be finer‑grained control over resource and operation defaults.
> 
> Currently, you can set meta‑attribute values in the CIB's rsc_defaults
> section to apply to all resources, and op_defaults to apply to all
> operations. Rules can be used to apply defaults only during certain
> times. For example, to set a default stickiness of INFINITY during
> business hours and 0 outside those hours:
> 
>
> 
>   
> 
>   
> 
>   
>value="INFINITY"/>
> 
> 
>   
> 
>
> 
> But what if you want to change the default stickiness of just pgsql
> databases? Or the default timeout of only start operations?

We are using a rather similar scenario, like the stickyness. However we
distinguish between productive and "no so productive" (test, developement)
resources. First we release the stickiness of non-essential resources so that
they can be re-balanced if needed. Later when the productive resources are
released, the nodes maybe balanced already using the non-essential resources.

At the moment we copied the rules to each resource, which is not nice, of
course.

I'd appreciate:
date_spec be defined once and reused often
rule be defined once and reused often

> 
> 2.0.5 will add new rule expressions for this purpose. Examples:
> 
>
> 
>   
>  provider="heartbeat" type="pgsqlms"/>
>   
>value="INFINITY"/>
> 
>
> 
>
> 
>   
> 
>   
>   
> 
>
> 
> You can combine rsc_expression and op_expression in op_defaults rules,
> if for example you want to set a default stop timeout for all
> ocf:heartbeat:docker resources.
> 
> This obviously can be convenient if you have many resources of the same
> type, but it has one other trick up its sleeve: this is the only way
> you can affect the meta‑attributes of resources implicitly created by
> Pacemaker for bundles.
> 
> When you configure a bundle, Pacemaker will implicitly create container
> resources (ocf:heartbeat:docker, ocf:heartbeat:rkt, or
> ocf:heartbeat:podman) and if appropriate, IP resources
> (ocf:heartbeat:IPaddr2). Previously, there was no way to directly
> affect these resources, but with these new expressions you can at least
> configure defaults that apply to them, without having to use those same
> defaults for all your resources.
> ‑‑ 
> Ken Gaillot 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/