Re: [ClusterLabs] Why my node1 couldn't back to the clustering chain?
Thanks. Cheat sheet is a PDF file that all useful commands and parameters existed in it. On Friday, April 9, 2021, 01:12:16 PM GMT+4:30, Antony Stone wrote: On Friday 09 April 2021 at 10:34:33, Jason Long wrote: > Thanks. > I meant was a Cheat sheet. I don't understand that sentence. > Yes, something like rendering a 3D movie or... . The Corosync and Pacemaker > are not OK for it? What kind of clustering using for rendering? Beowulf > cluster? Corosync and pacemaker are for High Availability, which generally means that you have more computing resources than you need at any given time, in order that a failed machine can be efficiently replaced by a working one. If all your machines are busy, and one fails, you have no spare computing resources to take over from the failed one. The setup you were asking about is High Performance computing, where you are trying to use the resources you have as efficiently and continuously as possible, therefore you don't have any spare capacity (since 'spare' means 'wasted' in this regard). A Beowulf Cluster is one example of the sort of thing you're asking about; for others, see the "Implementations" section of the URL I previously provided. Antony. -- https://tools.ietf.org/html/rfc6890 - providing 16 million IPv4 addresses for talking to yourself. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
On 4/9/21 4:04 PM, Klaus Wenninger wrote: On 4/9/21 3:45 PM, Klaus Wenninger wrote: On 4/9/21 3:36 PM, Klaus Wenninger wrote: On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Selinux is not enabled. Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop? yep ... that doesn't look good. While in pcmk_shutdown_worker ipc isn't handled. Stop ... that should actually work as pcmk_shutdown_worker should exit quite quickly and proceed after mainloop dispatching when called again. Don't see anything atm that might be blocking for longer ... but let me dig into it further ... What happens is clear (thanks Ken for the hint ;-) ). When pacemakerd is shutting down - already when it shuts down the resources and not just when it starts to reap the subdaemons - crm_mon reads that state and doesn't try to connect to the cib anymore. Question is why that didn't create issue earlier. Probably I didn't test with resources that had crm_mon in their stop/monitor-actions but sbd should have run into issues. Klaus But when shutting down a node the resources should be shutdown before pacemakerd goes down. But let me have a look if it can happen that pacemakerd doesn't react to the ipc-pings before. That btw. might be lethal for sbd-scenarios (if the phase is too long and it migh actually not be defined). My idea with selinux would have been that it might block the ipc if crm_mon is issued by execd. But well forget about it as it is not enabled ;-) Klaus pgsql needs the result of crm_mon in demote processing and stop processing. crm_mon should return a response even after pacemakerd goes into a stop operation. Best Regards, Hideo Yamauchi. - Original Message - From: Klaus Wenninger To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed Cc: Date: 2021/4/9, Fri 21:12 Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails. On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Ken, Hi All, In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed. However, pacemaker included in RHEL8.4beta fails to execute this crm_mon. - The problem also occurs on github master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). The problem can be easily reproduced in the following ways. Step1. Modify to execute crm_mon in the stop process of the Dummy resource. dummy_stop() { mon=$(crm_mon -1) ret=$? ocf_log info "### YAMAUCHI crm_mon[${ret}] : ${mon}" dummy_monitor if [ $? = $OCF_SUCCESS ]; then rm ${OCF_RESKEY_state} fi return $OCF_SUCCESS } Step2. Configure a cluster with two nodes. [root@rh84-beta01 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:00:52 2021 * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta01 rh84-beta02 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01 Migration Summary: Step3. Stop the node where the Dummy resource is running. The resource will fail over. [root@rh84-beta02 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:08:56 2021 * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta02 ] * OFFLINE: [ rh84-beta01 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02 However, if you look at the log, you can see that the execution of crm_mon in the stop processing of the Dummy resource has failed. Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI crm_mon[102] : Pacemaker daemons shutting down ... Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output) notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not available on this node ] Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Klaus Similarly, pgsql also executes crm_mon with demote or stop, so control fails. The problem seems to be related to the next fix. * Report pacemakerd in state waiting for sbd - https://github.com/ClusterLabs/pacemaker/pull/2278 The problem does not occur with the release version of Pacemaker 2.0.5 or the Pacemaker included wit
Re: [ClusterLabs] Fwd: Issue with resource-agents ocf:heartbeat:mariadb
Hi, I'm not very familiar with the mariadb agent, but one thing to check is that the output of "uname -n" can be used in the CHANGE MASTER command. If not, you need to set node attributes for the right names to use. I believe you have to configure and start replication manually once before the cluster can manage it automatically. On Fri, 2021-04-09 at 10:04 +0200, Olivier POUILLY wrote: > Hi team, > Thanks for this great job on those library. > I would like to know if it was possible to get some help on the > mariadb resource. > After the configuration of my cluster pcs command shows me: > root@node1:~# pcs status > Cluster name: clusterserver > Stack: corosync > Current DC: node1 (version 2.0.1-9e909a5bdd) - partition with quorum > Last updated: Thu Apr 8 15:45:35 2021 > Last change: Thu Apr 8 15:45:25 2021 by root via cibadmin on node1 > > 2 nodes configured > 2 resources configured > > Online: [ node1 node2 ] > > Full list of resources: > > Clone Set: mariadb_server-clone [mariadb_server] (promotable) > Masters: [ node1 ] > Slaves: [ node2 ] > > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled > > But when I go to mysql on server2 I see my slave statys off: > MariaDB [(none)]> SHOW SLAVE STATUS\G > *** 1. row *** > Slave_IO_State: >Master_Host: node1 >Master_User: replication >Master_Port: 3306 > Connect_Retry: 60 >Master_Log_File: master-bin.01 >Read_Master_Log_Pos: 463 > Relay_Log_File: master-relay-bin.02 > Relay_Log_Pos: 672 > Relay_Master_Log_File: master-bin.01 > Slave_IO_Running: No > Slave_SQL_Running: No >Replicate_Do_DB: >Replicate_Ignore_DB: > Replicate_Do_Table: > Replicate_Ignore_Table: >Replicate_Wild_Do_Table: >Replicate_Wild_Ignore_Table: > Last_Errno: 0 > Last_Error: > Skip_Counter: 0 >Exec_Master_Log_Pos: 463 >Relay_Log_Space: 2935 >Until_Condition: None > Until_Log_File: > Until_Log_Pos: 0 > Master_SSL_Allowed: No > Master_SSL_CA_File: > Master_SSL_CA_Path: >Master_SSL_Cert: > Master_SSL_Cipher: > Master_SSL_Key: > Seconds_Behind_Master: NULL > Master_SSL_Verify_Server_Cert: No > Last_IO_Errno: 0 > Last_IO_Error: > Last_SQL_Errno: 0 > Last_SQL_Error: >Replicate_Ignore_Server_Ids: > Master_Server_Id: 0 > Master_SSL_Crl: > Master_SSL_Crlpath: > Using_Gtid: Current_Pos >Gtid_IO_Pos: >Replicate_Do_Domain_Ids: >Replicate_Ignore_Domain_Ids: > Parallel_Mode: conservative > SQL_Delay: 0 >SQL_Remaining_Delay: NULL >Slave_SQL_Running_State: > Slave_DDL_Groups: 0 > Slave_Non_Transactional_Groups: 0 > Slave_Transactional_Groups: 0 > > On pacemaker log I got the following message: > Apr 08 19:26:18 node2 pacemaker-execd [6899] (operation_finished) > notice: mariadb_server_start_0:7072:stderr [ Error performing > operation: No such device or address ] > > Here is the detailed of my configuration: > - pcs : 0.10.1 > - Pacemaker 2.0.1 > - Corosync Cluster Engine, version '3.0.1' > - mariadb Ver 15.1 Distrib 10.3.27-MariaDB > - Debian 10.8 > Mysql configuration: > [server] > [mysqld] > user= mysql > pid-file= /run/mysqld/mysqld.pid > socket = /run/mysqld/mysqld.sock > basedir = /usr > datadir = /var/lib/mysql > tmpdir = /tmp > lc-messages-dir = /usr/share/mysql > bind-address= 0.0.0.0 > query_cache_size= 16M > log_error = /var/log/mysql/error.log > server-id=2 > expire_logs_days= 10 > character-set-server = utf8mb4 > collation-server = utf8mb4_general_ci > [embedded] > [mariadb] > log-bin > server-id=2 > log-basename=master > [mariadb-10.3] > > Corosync configuration: > num_updates="0" admin_epoch="0" cib-last-written="Thu Apr 8 19:26:13 > 2021" update-origin="node1" update-client="cibadmin" update- > user="root" have-quorum="1" dc-uuid="1"> > > > > name="stonith-enabled" value="false"/> > > > > name="cluster-infrastructure" value="corosync"/> > name="cluster-name" value="clusterserver"/> > > > name="mariadb_server_REPL_INFO" value="node1"/> > > > > > > > >
Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
On 4/9/21 3:45 PM, Klaus Wenninger wrote: On 4/9/21 3:36 PM, Klaus Wenninger wrote: On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Selinux is not enabled. Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop? yep ... that doesn't look good. While in pcmk_shutdown_worker ipc isn't handled. Stop ... that should actually work as pcmk_shutdown_worker should exit quite quickly and proceed after mainloop dispatching when called again. Don't see anything atm that might be blocking for longer ... but let me dig into it further ... Question is why that didn't create issue earlier. Probably I didn't test with resources that had crm_mon in their stop/monitor-actions but sbd should have run into issues. Klaus But when shutting down a node the resources should be shutdown before pacemakerd goes down. But let me have a look if it can happen that pacemakerd doesn't react to the ipc-pings before. That btw. might be lethal for sbd-scenarios (if the phase is too long and it migh actually not be defined). My idea with selinux would have been that it might block the ipc if crm_mon is issued by execd. But well forget about it as it is not enabled ;-) Klaus pgsql needs the result of crm_mon in demote processing and stop processing. crm_mon should return a response even after pacemakerd goes into a stop operation. Best Regards, Hideo Yamauchi. - Original Message - From: Klaus Wenninger To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed Cc: Date: 2021/4/9, Fri 21:12 Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails. On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Ken, Hi All, In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed. However, pacemaker included in RHEL8.4beta fails to execute this crm_mon. - The problem also occurs on github master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). The problem can be easily reproduced in the following ways. Step1. Modify to execute crm_mon in the stop process of the Dummy resource. dummy_stop() { mon=$(crm_mon -1) ret=$? ocf_log info "### YAMAUCHI crm_mon[${ret}] : ${mon}" dummy_monitor if [ $? = $OCF_SUCCESS ]; then rm ${OCF_RESKEY_state} fi return $OCF_SUCCESS } Step2. Configure a cluster with two nodes. [root@rh84-beta01 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:00:52 2021 * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta01 rh84-beta02 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01 Migration Summary: Step3. Stop the node where the Dummy resource is running. The resource will fail over. [root@rh84-beta02 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:08:56 2021 * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta02 ] * OFFLINE: [ rh84-beta01 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02 However, if you look at the log, you can see that the execution of crm_mon in the stop processing of the Dummy resource has failed. Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI crm_mon[102] : Pacemaker daemons shutting down ... Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output) notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not available on this node ] Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Klaus Similarly, pgsql also executes crm_mon with demote or stop, so control fails. The problem seems to be related to the next fix. * Report pacemakerd in state waiting for sbd - https://github.com/ClusterLabs/pacemaker/pull/2278 The problem does not occur with the release version of Pacemaker 2.0.5 or the Pacemaker included with RHEL8.3. This issue has a huge impact on the user. Perhaps it also affects the control of other resources that utilize crm_mon. Please improve the release version of RHEL8.4 so that it includes Pacemaker which does not cause this problem. * Distributions other than RHEL may also be a
Re: [ClusterLabs] how to setup single node cluster
On Fri, 2021-04-09 at 08:20 +0300, Andrei Borzenkov wrote: > On 08.04.2021 09:26, d tbsky wrote: > > Reid Wahl > > > I don't think we do require fencing for single-node clusters. > > > (Anyone at Red Hat, feel free to comment.) I vaguely recall an > > > internal mailing list or IRC conversation where we discussed this > > > months ago, but I can't find it now. I've also checked our > > > support policies documentation, and it's not mentioned in the > > > "cluster size" doc or the "fencing" doc. > > > >since the cluster is 100% alive or 100% dead with single node, I > > think fencing/quorum is not required. I am just curious what is the > > usage case. since RedHat supports it, it must be useful in real > > scenario. > > > I do not know what "disaster recovery" configuration you have in > mind, > but if you intend to use geo clustering fencing can speed up fail- > over > so it is at least useful. > > Even in single node cluster if resource failed to stop you are stuck > - > you cannot actually do anything from that point without manual > intervention. Depending on configuration and requirements rebooting > node > may be considered as an attempt to automatically "reset" cluster > state. The use case for a single-node disaster recovery cluster is to have the main cluster be a full, multi-node cluster with fencing, with a single- node cluster at a remote site for disaster recovery when the main cluster is down (possibly for just the most essential resources). Fencing isn't critical for the DR site because if the DR site is being used, the main site is already down. The DR site could be activated automatically with booth (if a third arbitrator site is available), or manually by an administrator (for example by changing the target-role resource default, or manually assigning tickets). The advantages of using a cluster at all at a manual DR site are that administrators can use the same cluster management commands they're familiar with, and certain resources can always run at the DR site to keep it ready (e.g. shared storage or a database replicant). There are some ideas about making such a setup easier to manage, such as being able to coordinate configuration changes (each site has a separate cluster configuration), and maybe having "storage agents" to manage shared storage across clusters. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
On 4/9/21 3:36 PM, Klaus Wenninger wrote: On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Selinux is not enabled. Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop? yep ... that doesn't look good. While in pcmk_shutdown_worker ipc isn't handled. Question is why that didn't create issue earlier. Probably I didn't test with resources that had crm_mon in their stop/monitor-actions but sbd should have run into issues. Klaus But when shutting down a node the resources should be shutdown before pacemakerd goes down. But let me have a look if it can happen that pacemakerd doesn't react to the ipc-pings before. That btw. might be lethal for sbd-scenarios (if the phase is too long and it migh actually not be defined). My idea with selinux would have been that it might block the ipc if crm_mon is issued by execd. But well forget about it as it is not enabled ;-) Klaus pgsql needs the result of crm_mon in demote processing and stop processing. crm_mon should return a response even after pacemakerd goes into a stop operation. Best Regards, Hideo Yamauchi. - Original Message - From: Klaus Wenninger To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed Cc: Date: 2021/4/9, Fri 21:12 Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails. On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Ken, Hi All, In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed. However, pacemaker included in RHEL8.4beta fails to execute this crm_mon. - The problem also occurs on github master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). The problem can be easily reproduced in the following ways. Step1. Modify to execute crm_mon in the stop process of the Dummy resource. dummy_stop() { mon=$(crm_mon -1) ret=$? ocf_log info "### YAMAUCHI crm_mon[${ret}] : ${mon}" dummy_monitor if [ $? = $OCF_SUCCESS ]; then rm ${OCF_RESKEY_state} fi return $OCF_SUCCESS } Step2. Configure a cluster with two nodes. [root@rh84-beta01 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:00:52 2021 * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta01 rh84-beta02 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01 Migration Summary: Step3. Stop the node where the Dummy resource is running. The resource will fail over. [root@rh84-beta02 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:08:56 2021 * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta02 ] * OFFLINE: [ rh84-beta01 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02 However, if you look at the log, you can see that the execution of crm_mon in the stop processing of the Dummy resource has failed. Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI crm_mon[102] : Pacemaker daemons shutting down ... Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output) notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not available on this node ] Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Klaus Similarly, pgsql also executes crm_mon with demote or stop, so control fails. The problem seems to be related to the next fix. * Report pacemakerd in state waiting for sbd - https://github.com/ClusterLabs/pacemaker/pull/2278 The problem does not occur with the release version of Pacemaker 2.0.5 or the Pacemaker included with RHEL8.3. This issue has a huge impact on the user. Perhaps it also affects the control of other resources that utilize crm_mon. Please improve the release version of RHEL8.4 so that it includes Pacemaker which does not cause this problem. * Distributions other than RHEL may also be affected in future releases. This content is the same as the following Bugzilla. - https://bugs.clusterlabs.org/show_bug.cgi?id=5471 Best Regards, Hideo Yamauchi. ___ Manage your subscription: https://lists
Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Selinux is not enabled. Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop? But when shutting down a node the resources should be shutdown before pacemakerd goes down. But let me have a look if it can happen that pacemakerd doesn't react to the ipc-pings before. That btw. might be lethal for sbd-scenarios (if the phase is too long and it migh actually not be defined). My idea with selinux would have been that it might block the ipc if crm_mon is issued by execd. But well forget about it as it is not enabled ;-) Klaus pgsql needs the result of crm_mon in demote processing and stop processing. crm_mon should return a response even after pacemakerd goes into a stop operation. Best Regards, Hideo Yamauchi. - Original Message - From: Klaus Wenninger To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed Cc: Date: 2021/4/9, Fri 21:12 Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails. On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Ken, Hi All, In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed. However, pacemaker included in RHEL8.4beta fails to execute this crm_mon. - The problem also occurs on github master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). The problem can be easily reproduced in the following ways. Step1. Modify to execute crm_mon in the stop process of the Dummy resource. dummy_stop() { mon=$(crm_mon -1) ret=$? ocf_log info "### YAMAUCHI crm_mon[${ret}] : ${mon}" dummy_monitor if [ $? = $OCF_SUCCESS ]; then rm ${OCF_RESKEY_state} fi return $OCF_SUCCESS } Step2. Configure a cluster with two nodes. [root@rh84-beta01 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:00:52 2021 * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta01 rh84-beta02 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01 Migration Summary: Step3. Stop the node where the Dummy resource is running. The resource will fail over. [root@rh84-beta02 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:08:56 2021 * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta02 ] * OFFLINE: [ rh84-beta01 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02 However, if you look at the log, you can see that the execution of crm_mon in the stop processing of the Dummy resource has failed. Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI crm_mon[102] : Pacemaker daemons shutting down ... Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output) notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not available on this node ] Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Klaus Similarly, pgsql also executes crm_mon with demote or stop, so control fails. The problem seems to be related to the next fix. * Report pacemakerd in state waiting for sbd - https://github.com/ClusterLabs/pacemaker/pull/2278 The problem does not occur with the release version of Pacemaker 2.0.5 or the Pacemaker included with RHEL8.3. This issue has a huge impact on the user. Perhaps it also affects the control of other resources that utilize crm_mon. Please improve the release version of RHEL8.4 so that it includes Pacemaker which does not cause this problem. * Distributions other than RHEL may also be affected in future releases. This content is the same as the following Bugzilla. - https://bugs.clusterlabs.org/show_bug.cgi?id=5471 Best Regards, Hideo Yamauchi. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ -- Klaus Wenninger Senior Software Engineer, EMEA ENG Base Operating Systems Red Hat kwenn...@redhat.com Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 1532
Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
Hi Klaus, Thanks for your comment. > Hmm ... is that with selinux enabled? > Respectively do you see any related avc messages? Selinux is not enabled. Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop? pgsql needs the result of crm_mon in demote processing and stop processing. crm_mon should return a response even after pacemakerd goes into a stop operation. Best Regards, Hideo Yamauchi. - Original Message - > From: Klaus Wenninger > To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to > open-source clustering welcomed > Cc: > Date: 2021/4/9, Fri 21:12 > Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control > fails. > > On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote: >> Hi Ken, >> Hi All, >> >> In the pgsql resource, crm_mon is executed in the process of demote and > stop, and the result is processed. >> >> However, pacemaker included in RHEL8.4beta fails to execute this crm_mon. >> - The problem also occurs on github > master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). >> >> The problem can be easily reproduced in the following ways. >> >> Step1. Modify to execute crm_mon in the stop process of the Dummy resource. >> >> >> dummy_stop() { >> mon=$(crm_mon -1) >> ret=$? >> ocf_log info "### YAMAUCHI crm_mon[${ret}] : ${mon}" >> dummy_monitor >> if [ $? = $OCF_SUCCESS ]; then >> rm ${OCF_RESKEY_state} >> fi >> return $OCF_SUCCESS >> } >> >> >> Step2. Configure a cluster with two nodes. >> >> >> [root@rh84-beta01 ~]# crm_mon -rfA1 >> Cluster Summary: >> * Stack: corosync >> * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition > with quorum >> * Last updated: Thu Apr 8 18:00:52 2021 >> * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on > rh84-beta01 >> * 2 nodes configured >> * 1 resource instance configured >> >> Node List: >> * Online: [ rh84-beta01 rh84-beta02 ] >> >> Full List of Resources: >> * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01 >> >> Migration Summary: >> >> >> Step3. Stop the node where the Dummy resource is running. The resource will > fail over. >> >> [root@rh84-beta02 ~]# crm_mon -rfA1 >> Cluster Summary: >> * Stack: corosync >> * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition > with quorum >> * Last updated: Thu Apr 8 18:08:56 2021 >> * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on > rh84-beta01 >> * 2 nodes configured >> * 1 resource instance configured >> >> Node List: >> * Online: [ rh84-beta02 ] >> * OFFLINE: [ rh84-beta01 ] >> >> Full List of Resources: >> * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02 >> >> >> However, if you look at the log, you can see that the execution of crm_mon > in the stop processing of the Dummy resource has failed. >> >> >> Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI > crm_mon[102] : Pacemaker daemons shutting down ... >> Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output) > notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not > available on this node ] > Hmm ... is that with selinux enabled? > Respectively do you see any related avc messages? > > Klaus >> >> >> Similarly, pgsql also executes crm_mon with demote or stop, so control > fails. >> >> The problem seems to be related to the next fix. >> * Report pacemakerd in state waiting for sbd >> - https://github.com/ClusterLabs/pacemaker/pull/2278 >> >> The problem does not occur with the release version of Pacemaker 2.0.5 or > the Pacemaker included with RHEL8.3. >> >> This issue has a huge impact on the user. >> >> Perhaps it also affects the control of other resources that utilize > crm_mon. >> >> Please improve the release version of RHEL8.4 so that it includes Pacemaker > which does not cause this problem. >> * Distributions other than RHEL may also be affected in future releases. >> >> >> This content is the same as the following Bugzilla. >> - https://bugs.clusterlabs.org/show_bug.cgi?id=5471 >> >> >> Best Regards, >> Hideo Yamauchi. >> >> ___ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why my node1 couldn't back to the clustering chain?
Thanks. I meant was a Cheat sheet. Yes, something like rendering a 3D movie or... . The Corosync and Pacemaker are not OK for it? What kind of clustering using for rendering? Beowulf cluster? On Friday, April 9, 2021, 12:55:27 PM GMT+4:30, Antony Stone wrote: On Friday 09 April 2021 at 08:58:39, Jason Long wrote: > Thank you so much for your great answers. > As the final questions: Really :) ? > 1- Which commands are useful to monitoring and managing my pacemaker > cluster? Some people prefer https://crmsh.github.io/documentation/ and some people prefer https://github.com/ClusterLabs/pcs > 2- I don't know if this is a right question or not. Consider 100 PCs that > each of them have an Intel Core 2 Duo Processor (2 cores) with 4GB of RAM. > How can I merge these PCs together so that I have a system with 200 CPUs > and 400GB of RAM? The answer to that depends on what you want to do with them. As a general-purpose computing resource, you can't. The CPU on machine A has no (reasonable) access to the RAM on machine B, so no part of the system can actually work with 400GBytes RAM. For specialist purposes (generally speaking, performing the same tasks on small pieces of data all at the same time and then putting the results together at the end), you can create a very different type of "cluster" than the ones we talk about here with corosync and pacemaker. https://en.wikipedia.org/wiki/Computer_cluster A common usage for such a setup is frame rendering of computer generated films; give each of your 100 PCs one frame to render, put all the frames together in the right order at the end, and you've created your film in just over 1% of the time it would have taken on one computer (of the same type). Regards, Antony. -- Most people have more than the average number of legs. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Ken, Hi All, In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed. However, pacemaker included in RHEL8.4beta fails to execute this crm_mon. - The problem also occurs on github master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). The problem can be easily reproduced in the following ways. Step1. Modify to execute crm_mon in the stop process of the Dummy resource. dummy_stop() { mon=$(crm_mon -1) ret=$? ocf_log info "### YAMAUCHI crm_mon[${ret}] : ${mon}" dummy_monitor if [ $? = $OCF_SUCCESS ]; then rm ${OCF_RESKEY_state} fi return $OCF_SUCCESS } Step2. Configure a cluster with two nodes. [root@rh84-beta01 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:00:52 2021 * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta01 rh84-beta02 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01 Migration Summary: Step3. Stop the node where the Dummy resource is running. The resource will fail over. [root@rh84-beta02 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:08:56 2021 * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta02 ] * OFFLINE: [ rh84-beta01 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02 However, if you look at the log, you can see that the execution of crm_mon in the stop processing of the Dummy resource has failed. Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI crm_mon[102] : Pacemaker daemons shutting down ... Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output) notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not available on this node ] Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Klaus Similarly, pgsql also executes crm_mon with demote or stop, so control fails. The problem seems to be related to the next fix. * Report pacemakerd in state waiting for sbd - https://github.com/ClusterLabs/pacemaker/pull/2278 The problem does not occur with the release version of Pacemaker 2.0.5 or the Pacemaker included with RHEL8.3. This issue has a huge impact on the user. Perhaps it also affects the control of other resources that utilize crm_mon. Please improve the release version of RHEL8.4 so that it includes Pacemaker which does not cause this problem. * Distributions other than RHEL may also be affected in future releases. This content is the same as the following Bugzilla. - https://bugs.clusterlabs.org/show_bug.cgi?id=5471 Best Regards, Hideo Yamauchi. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why my node1 couldn't back to the clustering chain?
Thank you so much for your great answers. As the final questions: 1- Which commands are useful to monitoring and managing my pacemaker cluster? 2- I don't know if this is a right question or not. Consider 100 PCs that each of them have an Intel Core 2 Duo Processor (2 cores) with 4GB of RAM. How can I merge these PCs together so that I have a system with 200 CPUs and 400GB of RAM? On Friday, April 9, 2021, 12:13:45 AM GMT+4:30, Antony Stone wrote: On Thursday 08 April 2021 at 21:33:48, Jason Long wrote: > Yes, I just wanted to know. In clustering, when a node is down and > go online again, then the cluster will not use it again until another node > fails. Am I right? Think of it like this: You can have as many nodes in your cluster as you think you need, and I'm going to assume that you only need the resources running on one node at any given time. Cluster management (eg: corosync / pacemaker) will ensure that the resources are running on *a* node. The resources will be moved *away* from that node if they can't run there any more, for some reason (the node going down is a good reason). However, there is almost never any concept of the resources being moved *to* a (specific) node. If they get moved away from one node, then obviously they need to be moved to another one, but the move happens because the resources have to be moved *away* from the first node, not because the cluster thinks they need to be moved *to* the second node. So, if a node is running its resources quite happily, it doesn't matter what happens to all the other nodes (provided quorum remains); the resources will stay running on that same node all the time. Antony. -- Was ist braun, liegt ins Gras, und raucht? Ein Kaminchen... Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Why my node1 couldn't back to the clustering chain?
On Friday 09 April 2021 at 11:06:14, Ulrich Windl wrote: > # lscpu > CPU(s): 144 > # free -h > Mem: 754Gi Nice :) No doubt Jason would like to connect 8 of these together in a cluster... Antony. -- Numerous psychological studies over the years have demonstrated that the majority of people genuinely believe they are not like the majority of people. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: Why my node1 couldn't back to the clustering chain?
>>> Jason Long schrieb am 09.04.2021 um 08:58 in Nachricht <2055279672.56029.1617951519...@mail.yahoo.com>: > Thank you so much for your great answers. > As the final questions: > 1- Which commands are useful to monitoring and managing my pacemaker > cluster? My favorite is "crm_mon -1Arfj". > > 2- I don't know if this is a right question or not. Consider 100 PCs that > each of them have an Intel Core 2 Duo Processor (2 cores) with 4GB of RAM. > How can I merge these PCs together so that I have a system with 200 CPUs and > 400GB of RAM? If you don't just want to recycle old hardware, you could consider buying _one_ recent machine that has almost all that cores and RAM in one machine, probably saving a lot of power and space, too. Like here: # grep MHz /proc/cpuinfo | wc -l 144 # lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 144 On-line CPU(s) list: 0-143 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 4 NUMA node(s):4 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Stepping:7 CPU MHz: 1001.007 ... # free -h totalusedfree shared buff/cache available Mem: 754Gi 1.7Gi 744Gi75Mi 8.1Gi 748Gi Regards, Ulrich > > > > > > > On Friday, April 9, 2021, 12:13:45 AM GMT+4:30, Antony Stone > wrote: > > > > > > On Thursday 08 April 2021 at 21:33:48, Jason Long wrote: > >> Yes, I just wanted to know. In clustering, when a node is down and >> go online again, then the cluster will not use it again until another node >> fails. Am I right? > > Think of it like this: > > You can have as many nodes in your cluster as you think you need, and I'm > going to assume that you only need the resources running on one node at any > given time. > > Cluster management (eg: corosync / pacemaker) will ensure that the resources > > are running on *a* node. > > The resources will be moved *away* from that node if they can't run there > any > more, for some reason (the node going down is a good reason). > > However, there is almost never any concept of the resources being moved *to* > a > (specific) node. If they get moved away from one node, then obviously they > need to be moved to another one, but the move happens because the resources > have to be moved *away* from the first node, not because the cluster thinks > they need to be moved *to* the second node. > > So, if a node is running its resources quite happily, it doesn't matter what > > happens to all the other nodes (provided quorum remains); the resources will > > stay running on that same node all the time. > > > Antony. > > -- > Was ist braun, liegt ins Gras, und raucht? > Ein Kaminchen... > > > Please reply to the list; > please *don't* CC > me. > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: Why my node1 couldn't back to the clustering chain?
>>> Jason Long schrieb am 08.04.2021 um 21:33 in Nachricht <1151501391.584136.1617910428...@mail.yahoo.com>: > Yes, I just wanted to know. In clustering, when a node is down and go online > again, then the cluster will not use it again until another node fails. Am I > right? Hi! Read about "stickiness", maybe setting it to zero, and see if that makes you happier. If not, you learned whjat stickiness is for. Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why my node1 couldn't back to the clustering chain?
On Friday 09 April 2021 at 10:34:33, Jason Long wrote: > Thanks. > I meant was a Cheat sheet. I don't understand that sentence. > Yes, something like rendering a 3D movie or... . The Corosync and Pacemaker > are not OK for it? What kind of clustering using for rendering? Beowulf > cluster? Corosync and pacemaker are for High Availability, which generally means that you have more computing resources than you need at any given time, in order that a failed machine can be efficiently replaced by a working one. If all your machines are busy, and one fails, you have no spare computing resources to take over from the failed one. The setup you were asking about is High Performance computing, where you are trying to use the resources you have as efficiently and continuously as possible, therefore you don't have any spare capacity (since 'spare' means 'wasted' in this regard). A Beowulf Cluster is one example of the sort of thing you're asking about; for others, see the "Implementations" section of the URL I previously provided. Antony. -- https://tools.ietf.org/html/rfc6890 - providing 16 million IPv4 addresses for talking to yourself. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why my node1 couldn't back to the clustering chain?
On Friday 09 April 2021 at 08:58:39, Jason Long wrote: > Thank you so much for your great answers. > As the final questions: Really :) ? > 1- Which commands are useful to monitoring and managing my pacemaker > cluster? Some people prefer https://crmsh.github.io/documentation/ and some people prefer https://github.com/ClusterLabs/pcs > 2- I don't know if this is a right question or not. Consider 100 PCs that > each of them have an Intel Core 2 Duo Processor (2 cores) with 4GB of RAM. > How can I merge these PCs together so that I have a system with 200 CPUs > and 400GB of RAM? The answer to that depends on what you want to do with them. As a general-purpose computing resource, you can't. The CPU on machine A has no (reasonable) access to the RAM on machine B, so no part of the system can actually work with 400GBytes RAM. For specialist purposes (generally speaking, performing the same tasks on small pieces of data all at the same time and then putting the results together at the end), you can create a very different type of "cluster" than the ones we talk about here with corosync and pacemaker. https://en.wikipedia.org/wiki/Computer_cluster A common usage for such a setup is frame rendering of computer generated films; give each of your 100 PCs one frame to render, put all the frames together in the right order at the end, and you've created your film in just over 1% of the time it would have taken on one computer (of the same type). Regards, Antony. -- Most people have more than the average number of legs. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Fwd: Issue with resource-agents ocf:heartbeat:mariadb
Hi team, Thanks for this great job on those library. I would like to know if it was possible to get some help on the mariadb resource. After the configuration of my cluster pcs command shows me: root@node1:~# pcs status Cluster name: clusterserver Stack: corosync Current DC: node1 (version 2.0.1-9e909a5bdd) - partition with quorum Last updated: Thu Apr 8 15:45:35 2021 Last change: Thu Apr 8 15:45:25 2021 by root via cibadmin on node1 2 nodes configured 2 resources configured Online: [ node1 node2 ] Full list of resources: Clone Set: mariadb_server-clone [mariadb_server] (promotable) Masters: [ node1 ] Slaves: [ node2 ] Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled But when I go to mysql on server2 I see my slave statys off: MariaDB [(none)]> SHOW SLAVE STATUS\G *** 1. row *** Slave_IO_State: Master_Host: node1 Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.01 Read_Master_Log_Pos: 463 Relay_Log_File: master-relay-bin.02 Relay_Log_Pos: 672 Relay_Master_Log_File: master-bin.01 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 463 Relay_Log_Space: 2935 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Current_Pos Gtid_IO_Pos: Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: conservative SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave_DDL_Groups: 0 Slave_Non_Transactional_Groups: 0 Slave_Transactional_Groups: 0 On pacemaker log I got the following message: Apr 08 19:26:18 node2 pacemaker-execd [6899] (operation_finished) notice: mariadb_server_start_0:7072:stderr [ Error performing operation: No such device or address ] Here is the detailed of my configuration: - pcs : 0.10.1 - Pacemaker 2.0.1 - Corosync Cluster Engine, version '3.0.1' - mariadb Ver 15.1 Distrib 10.3.27-MariaDB - Debian 10.8 Mysql configuration: [server] [mysqld] user = mysql pid-file = /run/mysqld/mysqld.pid socket = /run/mysqld/mysqld.sock basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc-messages-dir = /usr/share/mysql bind-address = 0.0.0.0 query_cache_size = 16M log_error = /var/log/mysql/error.log server-id=2 expire_logs_days = 10 character-set-server = utf8mb4 collation-server = utf8mb4_general_ci [embedded] [mariadb] log-bin server-id=2 log-basename=master [mariadb-10.3] Corosync configuration: num_updates="0" admin_epoch="0" cib-last-written="Thu Apr 8 19:26:13 2021" update-origin="node1" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="1"> name="stonith-enabled" value="false"/> name="no-quorum-policy" value="ignore"/> name="have-watchdog" value="false"/> value="2.0.1-9e909a5bdd"/> name="cluster-infrastructure" value="corosync"/> name="cluster-name" value="clusterserver"/> name="mariadb_server_REPL_INFO" value="node1"/> type="mariadb"> name="binary" value="/usr/sbin/mysqld"/> name="config" value="/etc/mysql/my.cnf"/> name="datadir" value="/var/lib/mysql"/> name="node_list" value="node1 node2"/> name="pid" value="/var/run/mysqld/mysqld.pid"/> id="mariadb_server-instance_attributes-replication_passwd" name="replication_passwd" value="similarly-secure-password"/> id="mariadb_server-instance_attributes-replication_user" name="replication_user" value="replication"/> name="sock