Re: [Pacemaker] Y should pacemaker be started simultaneously.
В Mon, 06 Oct 2014 10:27:49 -0400 Digimer li...@alteeve.ca пишет: On 06/10/14 02:11 AM, Andrei Borzenkov wrote: On Mon, Oct 6, 2014 at 9:03 AM, Digimer li...@alteeve.ca wrote: If stonith was configured, after the time out, the first node would fence the second node (unable to reach != off). Alternatively, you can set corosync to 'wait_for_all' and have the first node do nothing until it sees the peer. Am I right that wait_for_all is available only in corosync 2.x and not in 1.x? You are correct, yes. To do otherwise would be to risk a split-brain. Each node needs to know the state of the peer in order to run services safely. By having both start at the same time, then they know what the other is doing. By disabling quorum, you allow one node to continue to operate when the other leaves, but it needs that initial connection to know for sure what it's doing. Does it apply to both corosync 1.x and 2.x or only to 2.x with wait_for_all? Because I actually also was confused about precise meaning of disabling quorum in pacemaker (setting no-quorum-policy: ignore). So if I have two node cluster with pacemaker 1.x and corosync 1.x with no-quorum-policy=ignore and no fencing - what happens when one single node starts? Quorum tells the cluster that if a peer leaves (gracefully or was fenced), the remaining node is allowed to continue providing services. Stonith is needed to put a node that is in an unknown state into a known state; Be it because it couldn't reach the node when starting or because the node stopped responding. So quorum and stonith play rather different roles. Without stonith, regardless of quorum, you risk split-brains and/or data corruption. Operating a cluster without stonith is to operate a cluster in an undermined state and should never be done. OK I try to rephrase. Is it possible to achieve the same effect as wait_for_all in corosync 2.x with combination of pacemaker 1.1.x and corosync 1.x? I.e. ensure that cluster does not come up *on the first startup* until all nodes are present? So just make cluster nodes wait for others to join instead of trying to stonith them? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Y should pacemaker be started simultaneously.
On 18/10/14 12:18 AM, Andrei Borzenkov wrote: В Mon, 06 Oct 2014 10:27:49 -0400 Digimer li...@alteeve.ca пишет: On 06/10/14 02:11 AM, Andrei Borzenkov wrote: On Mon, Oct 6, 2014 at 9:03 AM, Digimer li...@alteeve.ca wrote: If stonith was configured, after the time out, the first node would fence the second node (unable to reach != off). Alternatively, you can set corosync to 'wait_for_all' and have the first node do nothing until it sees the peer. Am I right that wait_for_all is available only in corosync 2.x and not in 1.x? You are correct, yes. To do otherwise would be to risk a split-brain. Each node needs to know the state of the peer in order to run services safely. By having both start at the same time, then they know what the other is doing. By disabling quorum, you allow one node to continue to operate when the other leaves, but it needs that initial connection to know for sure what it's doing. Does it apply to both corosync 1.x and 2.x or only to 2.x with wait_for_all? Because I actually also was confused about precise meaning of disabling quorum in pacemaker (setting no-quorum-policy: ignore). So if I have two node cluster with pacemaker 1.x and corosync 1.x with no-quorum-policy=ignore and no fencing - what happens when one single node starts? Quorum tells the cluster that if a peer leaves (gracefully or was fenced), the remaining node is allowed to continue providing services. Stonith is needed to put a node that is in an unknown state into a known state; Be it because it couldn't reach the node when starting or because the node stopped responding. So quorum and stonith play rather different roles. Without stonith, regardless of quorum, you risk split-brains and/or data corruption. Operating a cluster without stonith is to operate a cluster in an undermined state and should never be done. OK I try to rephrase. Is it possible to achieve the same effect as wait_for_all in corosync 2.x with combination of pacemaker 1.1.x and corosync 1.x? I.e. ensure that cluster does not come up *on the first startup* until all nodes are present? So just make cluster nodes wait for others to join instead of trying to stonith them? No, not that I know of. To achieve the same behaviour, I wrote my own program[1] to do this. It is called on boot and waits for the peer to become reachable, then it starts the cluster stack. So the same effect is gained, but it's done outside corosync directly. Note that I write it for corosync 1.x + cman + rgmanager, but the concepts port trivially. digimer 1. https://github.com/digimer/an-cdb/blob/master/tools/safe_anvil_start -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Y should pacemaker be started simultaneously.
On 06/10/14 02:11 AM, Andrei Borzenkov wrote: On Mon, Oct 6, 2014 at 9:03 AM, Digimer li...@alteeve.ca wrote: If stonith was configured, after the time out, the first node would fence the second node (unable to reach != off). Alternatively, you can set corosync to 'wait_for_all' and have the first node do nothing until it sees the peer. Am I right that wait_for_all is available only in corosync 2.x and not in 1.x? You are correct, yes. To do otherwise would be to risk a split-brain. Each node needs to know the state of the peer in order to run services safely. By having both start at the same time, then they know what the other is doing. By disabling quorum, you allow one node to continue to operate when the other leaves, but it needs that initial connection to know for sure what it's doing. Does it apply to both corosync 1.x and 2.x or only to 2.x with wait_for_all? Because I actually also was confused about precise meaning of disabling quorum in pacemaker (setting no-quorum-policy: ignore). So if I have two node cluster with pacemaker 1.x and corosync 1.x with no-quorum-policy=ignore and no fencing - what happens when one single node starts? Quorum tells the cluster that if a peer leaves (gracefully or was fenced), the remaining node is allowed to continue providing services. Stonith is needed to put a node that is in an unknown state into a known state; Be it because it couldn't reach the node when starting or because the node stopped responding. So quorum and stonith play rather different roles. Without stonith, regardless of quorum, you risk split-brains and/or data corruption. Operating a cluster without stonith is to operate a cluster in an undermined state and should never be done. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Y should pacemaker be started simultaneously.
Hi all, I had this question from a while, did not understand the logic for it. Why should I have to start pacemaker simultaneously on both of my nodes (of a 2 node cluster) simultaneously, although I have disabled quorum in the cluster. It fails in the startup step of [root@rk16 ~]# service pacemaker start Starting cluster: Checking if cluster has been disabled at boot...[ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs...[ OK ] Starting cman...[ OK ] Waiting for quorum... Timed-out waiting for cluster [FAILED] Stopping cluster: Leaving fence domain... [ OK ] Stopping gfs_controld...[ OK ] Stopping dlm_controld...[ OK ] Stopping fenced... [ OK ] Stopping cman...[ OK ] Waiting for corosync to shutdown:. [ OK ] Unloading kernel modules... [ OK ] Unmounting configfs... [ OK ] Starting Pacemaker Cluster Manager:[ OK ] [root@rk16 ~]# service pacemaker status pacemakerd dead but pid file exists [root@rk16 ~]# Regards, Ravikiran N ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Y should pacemaker be started simultaneously.
If stonith was configured, after the time out, the first node would fence the second node (unable to reach != off). Alternatively, you can set corosync to 'wait_for_all' and have the first node do nothing until it sees the peer. To do otherwise would be to risk a split-brain. Each node needs to know the state of the peer in order to run services safely. By having both start at the same time, then they know what the other is doing. By disabling quorum, you allow one node to continue to operate when the other leaves, but it needs that initial connection to know for sure what it's doing. Alternatively, by fencing the peer on start after timing out, it can say for sure that the peer is off and then start services knowing it won't cause a split-brain. Of course, if you auto-start the cluster and don't use wait_for_all, you risk a fence loop. digimer On 06/10/14 12:45 AM, N, Ravikiran wrote: Hi all, I had this question from a while, did not understand the logic for it. Why should I have to start pacemaker simultaneously on both of my nodes (of a 2 node cluster) simultaneously, although I have disabled quorum in the cluster. It fails in the startup step of /[root@rk16 ~]# service pacemaker start/ /Starting cluster:/ / Checking if cluster has been disabled at boot...[ OK ]/ / Checking Network Manager... [ OK ]/ / Global setup... [ OK ]/ / Loading kernel modules... [ OK ]/ / Mounting configfs...[ OK ]/ / Starting cman...[ OK ]/ / Waiting for quorum... Timed-out waiting for cluster/ / [FAILED]/ /Stopping cluster:/ / Leaving fence domain... [ OK ]/ / Stopping gfs_controld...[ OK ]/ / Stopping dlm_controld...[ OK ]/ / Stopping fenced... [ OK ]/ / Stopping cman...[ OK ]/ / Waiting for corosync to shutdown:. [ OK ]/ / Unloading kernel modules... [ OK ]/ / Unmounting configfs... [ OK ]/ /Starting Pacemaker Cluster Manager:[ OK ]/ /[root@rk16 ~]# service pacemaker status/ /pacemakerd dead but pid file exists/ /[root@rk16 ~]#/ Regards, Ravikiran N ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org