Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
On 13-03-25 03:50 PM, Jacek Konieczny wrote: The first node to notice that the other is unreachable will fence (kill) the other, making sure it is the only one operating on the shared data. Right. But with typical two-node clusters ignoring no-quorum, because quorum is being ignored, as soon as there is a communications breakdown, both nodes will notice the other is unreachable and both nodes will try to fence the other, entering into a death-match. It is entirely possible that both nodes end up killing each other and now you have no nodes running any resources! Even though it is only half of the node, the cluster is considered quorate as the other node is known not to be running any cluster resources. When the fenced node reboots its cluster stack starts, but with no quorum until it comminicates with the surviving node again. So no cluster services are started there until both nodes communicate properly and the proper quorum is recovered. But this requires a two-node cluster to be able to determine quorum and not be configured to ignore no-quorum which I think is the entire point of the OP's question. b. signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
El 25/03/13 20:50, Jacek Konieczny escribió: On Mon, 25 Mar 2013 20:01:28 +0100 Angel L. Mateo ama...@um.es wrote: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker I'm using corosync 1.1 which is the one provided with my distribution (ubuntu 12.04). I could also use cman. I don't think corosync 1.1 can do that, but I guess in this case cman should be able provide this functionality. Sorry, it's corosync 1.4, not 1.1. can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. I hace configured a dlm resource usted with clvm. One doubt... With this configuration, how split brain problem is handled? The first node to notice that the other is unreachable will fence (kill) the other, making sure it is the only one operating on the shared data. Even though it is only half of the node, the cluster is considered quorate as the other node is known not to be running any cluster resources. When the fenced node reboots its cluster stack starts, but with no quorum until it comminicates with the surviving node again. So no cluster services are started there until both nodes communicate properly and the proper quorum is recovered. But, will this work with corosync 1.4? Alghtough with corosync 1.4 I may won't be able to use quorum configuration you said (I'll try), I have configured no-quorum-policy=ignore so the cluster could still run in the case of one node failing. Could this be a problem? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
On Tue, Mar 26, 2013 at 6:30 PM, Angel L. Mateo ama...@um.es wrote: El 25/03/13 20:50, Jacek Konieczny escribió: On Mon, 25 Mar 2013 20:01:28 +0100 Angel L. Mateo ama...@um.es wrote: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker I'm using corosync 1.1 which is the one provided with my distribution (ubuntu 12.04). I could also use cman. I don't think corosync 1.1 can do that, but I guess in this case cman should be able provide this functionality. Sorry, it's corosync 1.4, not 1.1. can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. I hace configured a dlm resource usted with clvm. One doubt... With this configuration, how split brain problem is handled? The first node to notice that the other is unreachable will fence (kill) the other, making sure it is the only one operating on the shared data. Even though it is only half of the node, the cluster is considered quorate as the other node is known not to be running any cluster resources. When the fenced node reboots its cluster stack starts, but with no quorum until it comminicates with the surviving node again. So no cluster services are started there until both nodes communicate properly and the proper quorum is recovered. But, will this work with corosync 1.4? Alghtough with corosync 1.4 I may won't be able to use quorum configuration you said (I'll try), I have configured no-quorum-policy=ignore so the cluster could still run in the case of one node failing. Could this be a problem? Its essentially required for two-node clusters as quorum makes no sense. Without it the cluster would stop everything (everywhere) when a node failed (because quorum was lost). But it also tells pacemaker it can fence failed nodes (this is a good thing, as we can't recover the services from a failed node until we're 100% sure the node is powered off) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] stonith and avoiding split brain in two nodes cluster
Hello, I am newbie with pacemaker (and, generally, with ha clusters). I have configured a two nodes cluster. Both nodes are virtual machines (vmware esx) and use a shared storage (provided by a SAN, although access to the SAN is from esx infrastructure and VM consider it as scsi disk). I have configured clvm so logical volumes are only active in one of the nodes. Now I need some help with the stonith configuration to avoid data corrumption. Since I'm using ESX virtual machines, I think I won't have any problem using external/vcenter stonith plugin to shutdown virtual machines. My problem is how to avoid split brain situation with this configuration, without configuring a 3rd node. I have read about quorum disks, external/sbd stonith plugin and other references, but I'm too confused with all this. For example, [1] mention techniques to improve quorum with scsi reserve or quorum daemon, but it didn't point to how to do this pacemaker. Or [2] talks about external/sbd. Any help? PS: I have attached my corosync.conf and crm configure show outputs [1] http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html [2] http://www.gossamer-threads.com/lists/linuxha/pacemaker/78887 -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 # Please read the openais.conf.5 manual page totem { version: 2 # How long before declaring a token lost (ms) token: 3000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 10 # How long to wait for join messages in the membership protocol (ms) join: 60 # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 3600 # Turn off the virtual synchrony filter vsftype: none # Number of messages that may be sent by one processor on receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Disable encryption secauth: off # How many threads to use for encryption/decryption threads: 0 # Optionally assign a fixed node id (integer) # nodeid: 1234 # This specifies the mode of redundant ring, which may be none, active, or passive. rrp_mode: none interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 155.54.211.160 mcastaddr: 226.94.1.1 mcastport: 5405 } } amf { mode: disabled } service { # Load the Pacemaker Cluster Resource Manager ver: 1 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } node myotis51 node myotis52 primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ meta target-role=Started primitive dlm ocf:pacemaker:controld \ meta target-role=Started primitive vg_users1 ocf:heartbeat:LVM \ params volgrpname=UsersDisk exclusive=yes \ op monitor interval=60 timeout=60 group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave=true ordered=true target-role=Started location cli-prefer-vg_users1 vg_users1 \ rule $id=cli-prefer-rule-vg_users1 inf: #uname eq myotis52 property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1364212376 rsc_defaults $id=rsc-options \ resource-stickiness=100 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
I have a production cluster, using two vm on esx cluster, for stonith i'm using sbd, everything work fine 2013/3/25 emmanuel segura emi2f...@gmail.com I have a production cluster, using two vm on esx cluster, for stonith i'm using sbd, everything work find 2013/3/25 Angel L. Mateo ama...@um.es Hello, I am newbie with pacemaker (and, generally, with ha clusters). I have configured a two nodes cluster. Both nodes are virtual machines (vmware esx) and use a shared storage (provided by a SAN, although access to the SAN is from esx infrastructure and VM consider it as scsi disk). I have configured clvm so logical volumes are only active in one of the nodes. Now I need some help with the stonith configuration to avoid data corrumption. Since I'm using ESX virtual machines, I think I won't have any problem using external/vcenter stonith plugin to shutdown virtual machines. My problem is how to avoid split brain situation with this configuration, without configuring a 3rd node. I have read about quorum disks, external/sbd stonith plugin and other references, but I'm too confused with all this. For example, [1] mention techniques to improve quorum with scsi reserve or quorum daemon, but it didn't point to how to do this pacemaker. Or [2] talks about external/sbd. Any help? PS: I have attached my corosync.conf and crm configure show outputs [1] http://techthoughts.typepad.**com/managing_computers/2007/** 10/split-brain-quo.htmlhttp://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html [2] http://www.gossamer-threads.**com/lists/linuxha/pacemaker/**78887http://www.gossamer-threads.com/lists/linuxha/pacemaker/78887 -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
I have a production cluster, using two vm on esx cluster, for stonith i'm using sbd, everything work find 2013/3/25 Angel L. Mateo ama...@um.es Hello, I am newbie with pacemaker (and, generally, with ha clusters). I have configured a two nodes cluster. Both nodes are virtual machines (vmware esx) and use a shared storage (provided by a SAN, although access to the SAN is from esx infrastructure and VM consider it as scsi disk). I have configured clvm so logical volumes are only active in one of the nodes. Now I need some help with the stonith configuration to avoid data corrumption. Since I'm using ESX virtual machines, I think I won't have any problem using external/vcenter stonith plugin to shutdown virtual machines. My problem is how to avoid split brain situation with this configuration, without configuring a 3rd node. I have read about quorum disks, external/sbd stonith plugin and other references, but I'm too confused with all this. For example, [1] mention techniques to improve quorum with scsi reserve or quorum daemon, but it didn't point to how to do this pacemaker. Or [2] talks about external/sbd. Any help? PS: I have attached my corosync.conf and crm configure show outputs [1] http://techthoughts.typepad.**com/managing_computers/2007/** 10/split-brain-quo.htmlhttp://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html [2] http://www.gossamer-threads.**com/lists/linuxha/pacemaker/**78887http://www.gossamer-threads.com/lists/linuxha/pacemaker/78887 -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
On Mon, 25 Mar 2013 13:54:22 +0100 My problem is how to avoid split brain situation with this configuration, without configuring a 3rd node. I have read about quorum disks, external/sbd stonith plugin and other references, but I'm too confused with all this. For example, [1] mention techniques to improve quorum with scsi reserve or quorum daemon, but it didn't point to how to do this pacemaker. Or [2] talks about external/sbd. Any help? With corosync 2.2 (2.1 too, I guess) you can use, in corosync.conf: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. There is one more thing, though: you need two nodes active to boot the cluster, but then when one fails (and is fenced) the other may continue, keeping quorum. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
Jacek Konieczny jaj...@jajcus.net escribió: On Mon, 25 Mar 2013 13:54:22 +0100 My problem is how to avoid split brain situation with this configuration, without configuring a 3rd node. I have read about quorum disks, external/sbd stonith plugin and other references, but I'm too confused with all this. For example, [1] mention techniques to improve quorum with scsi reserve or quorum daemon, but it didn't point to how to do this pacemaker. Or [2] talks about external/sbd. Any help? With corosync 2.2 (2.1 too, I guess) you can use, in corosync.conf: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker I'm using corosync 1.1 which is the one provided with my distribution (ubuntu 12.04). I could also use cman. can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. I hace configured a dlm resource usted with clvm. One doubt... With this configuration, how split brain problem is handled? There is one more thing, though: you need two nodes active to boot the cluster, but then when one fails (and is fenced) the other may continue, keeping quorum. Greets, Jacek -- Enviado desde mi teléfono Android con K-9 Mail. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
On Mon, 25 Mar 2013 20:01:28 +0100 Angel L. Mateo ama...@um.es wrote: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker I'm using corosync 1.1 which is the one provided with my distribution (ubuntu 12.04). I could also use cman. I don't think corosync 1.1 can do that, but I guess in this case cman should be able provide this functionality. can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. I hace configured a dlm resource usted with clvm. One doubt... With this configuration, how split brain problem is handled? The first node to notice that the other is unreachable will fence (kill) the other, making sure it is the only one operating on the shared data. Even though it is only half of the node, the cluster is considered quorate as the other node is known not to be running any cluster resources. When the fenced node reboots its cluster stack starts, but with no quorum until it comminicates with the surviving node again. So no cluster services are started there until both nodes communicate properly and the proper quorum is recovered. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org