Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-28 Thread Brian J. Murrell
On 13-03-25 03:50 PM, Jacek Konieczny wrote:
 
 The first node to notice that the other is unreachable will fence (kill)
 the other, making sure it is the only one operating on the shared data.

Right.  But with typical two-node clusters ignoring no-quorum, because
quorum is being ignored, as soon as there is a communications breakdown,
both nodes will notice the other is unreachable and both nodes will try
to fence the other, entering into a death-match.

It is entirely possible that both nodes end up killing each other and
now you have no nodes running any resources!

 Even though it is only half of the node, the cluster is considered
 quorate as the other node is known not to be running any cluster
 resources.
 
 When the fenced node reboots its cluster stack starts, but with no
 quorum until it comminicates with the surviving node again. So no
 cluster services are started there until both nodes communicate
 properly and the proper quorum is recovered.

But this requires a two-node cluster to be able to determine quorum and
not be configured to ignore no-quorum which I think is the entire point
of the OP's question.

b.




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-26 Thread Angel L. Mateo

El 25/03/13 20:50, Jacek Konieczny escribió:

On Mon, 25 Mar 2013 20:01:28 +0100
Angel L. Mateo ama...@um.es wrote:

quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}

Corosync will then manage quorum for the two-node cluster and
Pacemaker


   I'm using corosync 1.1 which is the one  provided with my
distribution (ubuntu 12.04). I could also use cman.


I don't think corosync 1.1 can do that, but I guess in this case cman
should be able provide this functionality.


Sorry, it's corosync 1.4, not 1.1.


can use that. You still need proper fencing to enforce the quorum
(both for pacemaker and the storage layer – dlm in case you use
clvmd), but no
extra quorum node is needed.


   I hace configured a dlm resource usted with clvm.

   One doubt... With this configuration, how split brain problem is
handled?


The first node to notice that the other is unreachable will fence (kill)
the other, making sure it is the only one operating on the shared data.
Even though it is only half of the node, the cluster is considered
quorate as the other node is known not to be running any cluster
resources.

When the fenced node reboots its cluster stack starts, but with no
quorum until it comminicates with the surviving node again. So no
cluster services are started there until both nodes communicate
properly and the proper quorum is recovered.

	But, will this work with corosync 1.4? Alghtough with corosync 1.4 I 
may won't be able to use quorum configuration you said (I'll try), I 
have configured no-quorum-policy=ignore so the cluster could still run 
in the case of one node failing. Could this be a problem?


--
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información
y las Comunicaciones Aplicadas (ATICA)
http://www.um.es/atica
Tfo: 868889150
Fax: 86337

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-26 Thread Andrew Beekhof
On Tue, Mar 26, 2013 at 6:30 PM, Angel L. Mateo ama...@um.es wrote:
 El 25/03/13 20:50, Jacek Konieczny escribió:

 On Mon, 25 Mar 2013 20:01:28 +0100
 Angel L. Mateo ama...@um.es wrote:

 quorum {
 provider: corosync_votequorum
 expected_votes: 2
 two_node: 1
 }

 Corosync will then manage quorum for the two-node cluster and
 Pacemaker


I'm using corosync 1.1 which is the one  provided with my
 distribution (ubuntu 12.04). I could also use cman.


 I don't think corosync 1.1 can do that, but I guess in this case cman
 should be able provide this functionality.

 Sorry, it's corosync 1.4, not 1.1.


 can use that. You still need proper fencing to enforce the quorum
 (both for pacemaker and the storage layer – dlm in case you use
 clvmd), but no
 extra quorum node is needed.

I hace configured a dlm resource usted with clvm.

One doubt... With this configuration, how split brain problem is
 handled?


 The first node to notice that the other is unreachable will fence (kill)
 the other, making sure it is the only one operating on the shared data.
 Even though it is only half of the node, the cluster is considered
 quorate as the other node is known not to be running any cluster
 resources.

 When the fenced node reboots its cluster stack starts, but with no
 quorum until it comminicates with the surviving node again. So no
 cluster services are started there until both nodes communicate
 properly and the proper quorum is recovered.

 But, will this work with corosync 1.4? Alghtough with corosync 1.4 I
 may won't be able to use quorum configuration you said (I'll try), I have
 configured no-quorum-policy=ignore so the cluster could still run in the
 case of one node failing. Could this be a problem?

Its essentially required for two-node clusters as quorum makes no sense.
Without it the cluster would stop everything (everywhere) when a node
failed (because quorum was lost).

But it also tells pacemaker it can fence failed nodes (this is a good
thing, as we can't recover the services from a failed node until we're
100% sure the node is powered off)

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-25 Thread Angel L. Mateo

Hello,

	I am newbie with pacemaker (and, generally, with ha clusters). I have 
configured a two nodes cluster. Both nodes are virtual machines (vmware 
esx) and use a shared storage (provided by a SAN, although access to the 
SAN is from esx infrastructure and VM consider it as scsi disk). I have 
configured clvm so logical volumes are only active in one of the nodes.


	Now I need some help with the stonith configuration to avoid data 
corrumption. Since I'm using ESX virtual machines, I think I won't have 
any problem using external/vcenter stonith plugin to shutdown virtual 
machines.


	My problem is how to avoid split brain situation with this 
configuration, without configuring a 3rd node. I have read about quorum 
disks, external/sbd stonith plugin and other references, but I'm too 
confused with all this.


	For example, [1] mention techniques to improve quorum with scsi reserve 
or quorum daemon, but it didn't point to how to do this pacemaker. Or 
[2] talks about external/sbd.


Any help?

PS: I have attached my corosync.conf and crm configure show outputs

[1] 
http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html

[2] http://www.gossamer-threads.com/lists/linuxha/pacemaker/78887

--
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información
y las Comunicaciones Aplicadas (ATICA)
http://www.um.es/atica
Tfo: 868889150
Fax: 86337
# Please read the openais.conf.5 manual page

totem {
version: 2

# How long before declaring a token lost (ms)
token: 3000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join: 60

# How long to wait for consensus to be achieved before starting a new 
round of membership configuration (ms)
consensus: 3600

# Turn off the virtual synchrony filter
vsftype: none

# Number of messages that may be sent by one processor on receipt of 
the token
max_messages: 20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Disable encryption
secauth: off

# How many threads to use for encryption/decryption
threads: 0

# Optionally assign a fixed node id (integer)
# nodeid: 1234

# This specifies the mode of redundant ring, which may be none, active, 
or passive.
rrp_mode: none

interface {
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 155.54.211.160
mcastaddr: 226.94.1.1
mcastport: 5405
}
}

amf {
mode: disabled
}

service {
# Load the Pacemaker Cluster Resource Manager
ver:   1
name:  pacemaker
}

aisexec {
user:   root
group:  root
}

logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}
node myotis51
node myotis52
primitive clvm ocf:lvm2:clvmd \
params daemon_timeout=30 \
meta target-role=Started
primitive dlm ocf:pacemaker:controld \
meta target-role=Started
primitive vg_users1 ocf:heartbeat:LVM \
params volgrpname=UsersDisk exclusive=yes \
op monitor interval=60 timeout=60
group dlm-clvm dlm clvm
clone dlm-clvm-clone dlm-clvm \
meta interleave=true ordered=true target-role=Started
location cli-prefer-vg_users1 vg_users1 \
rule $id=cli-prefer-rule-vg_users1 inf: #uname eq myotis52
property $id=cib-bootstrap-options \
dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1364212376
rsc_defaults $id=rsc-options \
resource-stickiness=100

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-25 Thread emmanuel segura
I have a production cluster, using two vm on esx cluster, for stonith i'm
using sbd, everything work fine


2013/3/25 emmanuel segura emi2f...@gmail.com

 I have a production cluster, using two vm on esx cluster, for stonith i'm
 using sbd, everything work find

 2013/3/25 Angel L. Mateo ama...@um.es

 Hello,

 I am newbie with pacemaker (and, generally, with ha clusters). I
 have configured a two nodes cluster. Both nodes are virtual machines
 (vmware esx) and use a shared storage (provided by a SAN, although access
 to the SAN is from esx infrastructure and VM consider it as scsi disk). I
 have configured clvm so logical volumes are only active in one of the nodes.

 Now I need some help with the stonith configuration to avoid data
 corrumption. Since I'm using ESX virtual machines, I think I won't have any
 problem using external/vcenter stonith plugin to shutdown virtual machines.

 My problem is how to avoid split brain situation with this
 configuration, without configuring a 3rd node. I have read about quorum
 disks, external/sbd stonith plugin and other references, but I'm too
 confused with all this.

 For example, [1] mention techniques to improve quorum with scsi
 reserve or quorum daemon, but it didn't point to how to do this pacemaker.
 Or [2] talks about external/sbd.

 Any help?

 PS: I have attached my corosync.conf and crm configure show outputs

 [1] http://techthoughts.typepad.**com/managing_computers/2007/**
 10/split-brain-quo.htmlhttp://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html
 [2] 
 http://www.gossamer-threads.**com/lists/linuxha/pacemaker/**78887http://www.gossamer-threads.com/lists/linuxha/pacemaker/78887

 --
 Angel L. Mateo Martínez
 Sección de Telemática
 Área de Tecnologías de la Información
 y las Comunicaciones Aplicadas (ATICA)
 http://www.um.es/atica
 Tfo: 868889150
 Fax: 86337

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




 --
 esta es mi vida e me la vivo hasta que dios quiera




-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-25 Thread emmanuel segura
I have a production cluster, using two vm on esx cluster, for stonith i'm
using sbd, everything work find

2013/3/25 Angel L. Mateo ama...@um.es

 Hello,

 I am newbie with pacemaker (and, generally, with ha clusters). I
 have configured a two nodes cluster. Both nodes are virtual machines
 (vmware esx) and use a shared storage (provided by a SAN, although access
 to the SAN is from esx infrastructure and VM consider it as scsi disk). I
 have configured clvm so logical volumes are only active in one of the nodes.

 Now I need some help with the stonith configuration to avoid data
 corrumption. Since I'm using ESX virtual machines, I think I won't have any
 problem using external/vcenter stonith plugin to shutdown virtual machines.

 My problem is how to avoid split brain situation with this
 configuration, without configuring a 3rd node. I have read about quorum
 disks, external/sbd stonith plugin and other references, but I'm too
 confused with all this.

 For example, [1] mention techniques to improve quorum with scsi
 reserve or quorum daemon, but it didn't point to how to do this pacemaker.
 Or [2] talks about external/sbd.

 Any help?

 PS: I have attached my corosync.conf and crm configure show outputs

 [1] http://techthoughts.typepad.**com/managing_computers/2007/**
 10/split-brain-quo.htmlhttp://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html
 [2] 
 http://www.gossamer-threads.**com/lists/linuxha/pacemaker/**78887http://www.gossamer-threads.com/lists/linuxha/pacemaker/78887

 --
 Angel L. Mateo Martínez
 Sección de Telemática
 Área de Tecnologías de la Información
 y las Comunicaciones Aplicadas (ATICA)
 http://www.um.es/atica
 Tfo: 868889150
 Fax: 86337

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-25 Thread Jacek Konieczny
On Mon, 25 Mar 2013 13:54:22 +0100
   My problem is how to avoid split brain situation with this 
 configuration, without configuring a 3rd node. I have read about
 quorum disks, external/sbd stonith plugin and other references, but
 I'm too confused with all this.
 
   For example, [1] mention techniques to improve quorum with
 scsi reserve or quorum daemon, but it didn't point to how to do this
 pacemaker. Or [2] talks about external/sbd.
 
   Any help?


With corosync 2.2 (2.1 too, I guess) you can use, in corosync.conf:

quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}

Corosync will then manage quorum for the two-node cluster and Pacemaker
can use that. You still need proper fencing to enforce the quorum (both
for pacemaker and the storage layer – dlm in case you use clvmd), but no
extra quorum node is needed.

There is one more thing, though: you need two nodes active to boot the
cluster, but then when one fails (and is fenced) the other may continue,
keeping quorum.

Greets,
Jacek

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-25 Thread Angel L. Mateo


Jacek Konieczny jaj...@jajcus.net escribió:

On Mon, 25 Mar 2013 13:54:22 +0100
  My problem is how to avoid split brain situation with this 
 configuration, without configuring a 3rd node. I have read about
 quorum disks, external/sbd stonith plugin and other references, but
 I'm too confused with all this.
 
  For example, [1] mention techniques to improve quorum with
 scsi reserve or quorum daemon, but it didn't point to how to do this
 pacemaker. Or [2] talks about external/sbd.
 
  Any help?


With corosync 2.2 (2.1 too, I guess) you can use, in corosync.conf:

quorum {
   provider: corosync_votequorum
   expected_votes: 2
   two_node: 1
}

Corosync will then manage quorum for the two-node cluster and Pacemaker

  I'm using corosync 1.1 which is the one  provided with my distribution 
(ubuntu 12.04). I could also use cman.

can use that. You still need proper fencing to enforce the quorum (both
for pacemaker and the storage layer – dlm in case you use clvmd), but
no
extra quorum node is needed.

  I hace configured a dlm resource usted with clvm.

  One doubt... With this configuration, how split brain problem is handled?

There is one more thing, though: you need two nodes active to boot the
cluster, but then when one fails (and is fenced) the other may
continue,
keeping quorum.

Greets,
   Jacek

-- 
Enviado desde mi teléfono Android con K-9 Mail.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-25 Thread Jacek Konieczny
On Mon, 25 Mar 2013 20:01:28 +0100
Angel L. Mateo ama...@um.es wrote:
 quorum {
  provider: corosync_votequorum
  expected_votes: 2
  two_node: 1
 }
 
 Corosync will then manage quorum for the two-node cluster and
 Pacemaker
 
   I'm using corosync 1.1 which is the one  provided with my
 distribution (ubuntu 12.04). I could also use cman.

I don't think corosync 1.1 can do that, but I guess in this case cman
should be able provide this functionality.
 
 can use that. You still need proper fencing to enforce the quorum
 (both for pacemaker and the storage layer – dlm in case you use
 clvmd), but no
 extra quorum node is needed.
 
   I hace configured a dlm resource usted with clvm.
 
   One doubt... With this configuration, how split brain problem is
 handled?

The first node to notice that the other is unreachable will fence (kill)
the other, making sure it is the only one operating on the shared data.
Even though it is only half of the node, the cluster is considered
quorate as the other node is known not to be running any cluster
resources.

When the fenced node reboots its cluster stack starts, but with no
quorum until it comminicates with the surviving node again. So no
cluster services are started there until both nodes communicate
properly and the proper quorum is recovered.

Greets,
Jacek

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org