Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote: > Hello, > > after a long time I'm back to run heartbeat/pacemaker/corosync on our > XStreamOS/illumos distro. > I rebuilt the original components I did in 2016 on our latest release > (probably a bit outdated, but I want to start from where I left). > Looks like pacemaker is having trouble starting up showin this logs: > > Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log > Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log > Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active > directory to /sonicle/var/cluster/lib/pacemaker/cores > Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15 > (e174ec8) > Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in > state S_STARTING from crmd_init > Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active > directory to /sonicle/var/cluster/lib/pacemaker/cores > Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active > directory to /sonicle/var/cluster/lib/pacemaker/cores > Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying > cluster type: 'heartbeat' > Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an > active 'heartbeat' cluster > Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect: > Connecting to cluster infrastructure: heartbeat > Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not > start lrmd IPC server: Operation not supported (-48) This is repeated for all the subdaemons ... the error is coming from qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type for illumos. If you set it to "socket" it should work. > Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server: > shutting down and inhibiting respawn > Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory > from libxml2 > Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster > type: 'heartbeat' > Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an > active 'heartbeat' cluster > Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub- > system "pengine" > Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry > 25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1 > total) > Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid > d426a730-5229-6758-853a-99d4d491514a > Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: > Hostname: xstorage1 > Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: > UUID: d426a730-5229-6758-853a-99d4d491514a > Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting > to cluster infrastructure: heartbeat > Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could > not start attrd IPC server: Operation not supported (-48) > Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to > create attrd servers: exiting and inhibiting respawn. > Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify > pacemaker and pacemaker_remote are not both enabled. > Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active > directory to /sonicle/var/cluster/lib/pacemaker/cores > Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could > not start pengine IPC server: Operation not supported (-48) > Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC > server: shutting down and inhibiting respawn > Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up > memory from libxml2 > Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect > to the CIB service: Transport endpoint is not connected > Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't > complete CIB registration 1 times... pause and retry > Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process > pengine exited (pid=972, rc=100) > Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer > (I_NULL) just popped (2000ms) > Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect > to the CIB service: Transport endpoint is not connected > Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't > complete CIB registration 2 times... pause and retry > Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer > (I_NULL) just popped (2000ms) > Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect > to the CIB service: Transport endpoint is not connected > Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't > complete CIB registration 3 times... pause and retry > Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer > (I_NULL) just popped (2000ms) > Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect > to the CIB service: Transport endpoint is not connected > Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't > complete CIB registration 4 times... pause and retry > Jul 24
[ClusterLabs] pacemaker startup problem
Hello, after a long time I'm back to run heartbeat/pacemaker/corosync on our XStreamOS/illumos distro. I rebuilt the original components I did in 2016 on our latest release (probably a bit outdated, but I want to start from where I left). Looks like pacemaker is having trouble starting up showin this logs: Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active directory to /sonicle/var/cluster/lib/pacemaker/cores Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15 (e174ec8) Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in state S_STARTING from crmd_init Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active directory to /sonicle/var/cluster/lib/pacemaker/cores Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active directory to /sonicle/var/cluster/lib/pacemaker/cores Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying cluster type: 'heartbeat' Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an active 'heartbeat' cluster Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not start lrmd IPC server: Operation not supported (-48) Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server: shutting down and inhibiting respawn Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster type: 'heartbeat' Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an active 'heartbeat' cluster Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-system "pengine" Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry 25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1 total) Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid d426a730-5229-6758-853a-99d4d491514a Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: Hostname: xstorage1 Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: UUID: d426a730-5229-6758-853a-99d4d491514a Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could not start attrd IPC server: Operation not supported (-48) Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to create attrd servers: exiting and inhibiting respawn. Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify pacemaker and pacemaker_remote are not both enabled. Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active directory to /sonicle/var/cluster/lib/pacemaker/cores Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could not start pengine IPC server: Operation not supported (-48) Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC server: shutting down and inhibiting respawn Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up memory from libxml2 Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process pengine exited (pid=972, rc=100) Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't complete CIB registration 2 times... pause and retry Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't complete CIB registration 3 times... pause and retry Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't complete CIB registration 4 times... pause and retry Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect to the CIB service: Transport endpoint is not connected (-134) Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server: Could not start stonith-ng IPC server: Operation not supported (-48) Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init: Failed to create stonith-ng servers: exiting and inhibiting respawn. Jul
[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.5: finer control over resource and operation defaults
>>> Ken Gaillot schrieb am 23.07.2020 um 23:54 in Nachricht <99c11c73d59560fccd472d09c3b76073dab1b73e.ca...@redhat.com>: > Hi all, > > Pacemaker 2.0.4 is barely out the door, and we're already looking ahead > to 2.0.5, expected at the end of this year. > > One of the new features, already available in the master branch, will > be finer‑grained control over resource and operation defaults. > > Currently, you can set meta‑attribute values in the CIB's rsc_defaults > section to apply to all resources, and op_defaults to apply to all > operations. Rules can be used to apply defaults only during certain > times. For example, to set a default stickiness of INFINITY during > business hours and 0 outside those hours: > > > > > > > > >value="INFINITY"/> > > > > > > > But what if you want to change the default stickiness of just pgsql > databases? Or the default timeout of only start operations? We are using a rather similar scenario, like the stickyness. However we distinguish between productive and "no so productive" (test, developement) resources. First we release the stickiness of non-essential resources so that they can be re-balanced if needed. Later when the productive resources are released, the nodes maybe balanced already using the non-essential resources. At the moment we copied the rules to each resource, which is not nice, of course. I'd appreciate: date_spec be defined once and reused often rule be defined once and reused often > > 2.0.5 will add new rule expressions for this purpose. Examples: > > > > > provider="heartbeat" type="pgsqlms"/> > >value="INFINITY"/> > > > > > > > > > > > > > You can combine rsc_expression and op_expression in op_defaults rules, > if for example you want to set a default stop timeout for all > ocf:heartbeat:docker resources. > > This obviously can be convenient if you have many resources of the same > type, but it has one other trick up its sleeve: this is the only way > you can affect the meta‑attributes of resources implicitly created by > Pacemaker for bundles. > > When you configure a bundle, Pacemaker will implicitly create container > resources (ocf:heartbeat:docker, ocf:heartbeat:rkt, or > ocf:heartbeat:podman) and if appropriate, IP resources > (ocf:heartbeat:IPaddr2). Previously, there was no way to directly > affect these resources, but with these new expressions you can at least > configure defaults that apply to them, without having to use those same > defaults for all your resources. > ‑‑ > Ken Gaillot > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/