Re: [Pacemaker] Split-site cluster in two locations

2011-01-11 Thread Andrew Beekhof
On Tue, Dec 28, 2010 at 10:21 PM, Anton Altaparmakov ai...@cam.ac.uk wrote:
 Hi,

 On 28 Dec 2010, at 20:32, Michael Schwartzkopff wrote:
 Hi,

 I have four nodes in a split site scenario located in two computing centers.
 STONITH is enabled.

 Is there and best practise how to deal with this setup? Does it make sense to
 set expected-quorum-votes to 3 to make the whole setup still running with
 one data center online? Is this possible at all?

 Is quorum needed with STONITH enabled?

 Is there a quorum server available already?

 I couldn't see a quorum server in Pacemaker so I have installed a third dummy 
 node which is not allowed to run any resources (using location constraints 
 and setting the cluster to not be symmetric) which just acts as a third vote. 
  I am hoping this effectively acts as a quorum server as a node that looses 
 connectivity will lose quorum and shut down its services whilst the other 
 real node will retain connectivity and thus quorum due to the dummy node 
 still being present.

 Obviously this is quite wasteful of servers as you can only run a single 
 Pacemaker instance on a server (as far as I know) so that is a lot of dummy 
 servers when you run multiple pacemaker clusters...  Solution for us is to 
 use virtualization - one physical server with VMs and each VM is a dummy node 
 for a cluster...

With recent 1.1.x builds it should be possible to run just the
corosync piece (no pacemaker).


 Best regards,

        Anton


 Thanks for any hints,

 Greetings!

 --
 Dr. Michael Schwartzkopff
 Guardinistr. 63
 81375 München

 Tel: (0163) 172 50 98
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

 Best regards,

        Anton
 --
 Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
 Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
 Linux NTFS maintainer, http://www.linux-ntfs.org/


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Split-site cluster in two locations

2011-01-11 Thread Robert van Leeuwen
-Original message-
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; 
From:   Christoph Herrmann c.herrm...@science-computing.de
Sent:   Tue 11-01-2011 10:24
Subject:Re: [Pacemaker] Split-site cluster in two locations
 
 As long as you have only two computing centers it doesn't matter if you run a 
 corosync
 only piece or whatever  on a physikal or a virtual machine. The question is: 
 How to
 configure a four node (or six node, an even number bigger then two) 
 corosync/pacemaker
 cluster to continue services if you have a blackout in one computing center 
 (you will
 always loose (at least) one half of your nodes), but to shutdown everything 
 if 
 you have
 less then half of the node available. Are there any best practices on how to 
 deal with
 clusters in two computing centers? Anything like an external quorum node or a 
 quorum
 partition? I'd like to set the expected-quorum-votes to 3 but this is not 
 possible
 (with corosync-1.2.6 and pacemaker-1.1.2 on SLES11 SP1) Does anybody know why?
 Currently, the only way I can figure out is to run the cluster with 
 no-quorum-policy=ignore. But I don't like that. Any suggestions?


Apart from the number of nodes in de datacenter: with 2 datacentre's you have 
another issue:
How do you know which DC is reachable (from you're clients point of view) when 
the communication between DC fails?
Best fix for this would be a node at a third DC but you still run into problems 
with the fencing devices.
I doubt you can remotely power-off the non-responding DC :-)
So a split brain situation is likely to happen sometime. 

So for 100% data integrity I think it is best to let the cluster freeze 
itself...

Best Regards,
Robert van Leeuwen

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Split-site cluster in two locations

2011-01-11 Thread Holger Teutsch
On Tue, 2011-01-11 at 10:21 +0100, Christoph Herrmann wrote:
 -Ursprüngliche Nachricht-
 Von: Andrew Beekhof and...@beekhof.net
 Gesendet: Di 11.01.2011 09:01
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; 
 CC: Michael Schwartzkopff mi...@clusterbau.com; 
 Betreff: Re: [Pacemaker] Split-site cluster in two locations
 
  On Tue, Dec 28, 2010 at 10:21 PM, Anton Altaparmakov ai...@cam.ac.uk 
  wrote:
   Hi,
  
   On 28 Dec 2010, at 20:32, Michael Schwartzkopff wrote:
   Hi,
  
   I have four nodes in a split site scenario located in two computing 
   centers.
   STONITH is enabled.
  
   Is there and best practise how to deal with this setup? Does it make 
   sense to
   set expected-quorum-votes to 3 to make the whole setup still running 
   with
   one data center online? Is this possible at all?
  
   Is quorum needed with STONITH enabled?
  
   Is there a quorum server available already?
  
   I couldn't see a quorum server in Pacemaker so I have installed a third 
   dummy 
  node which is not allowed to run any resources (using location constraints 
  and 
  setting the cluster to not be symmetric) which just acts as a third vote.  
  I am 
  hoping this effectively acts as a quorum server as a node that looses 
  connectivity will lose quorum and shut down its services whilst the other 
  real 
  node will retain connectivity and thus quorum due to the dummy node still 
  being 
  present.
  
   Obviously this is quite wasteful of servers as you can only run a single 
  Pacemaker instance on a server (as far as I know) so that is a lot of dummy 
  servers when you run multiple pacemaker clusters...  Solution for us is to 
  use 
  virtualization - one physical server with VMs and each VM is a dummy node 
  for a 
  cluster...
  
  With recent 1.1.x builds it should be possible to run just the
  corosync piece (no pacemaker).
  
 
 As long as you have only two computing centers it doesn't matter if you run a 
 corosync
 only piece or whatever  on a physikal or a virtual machine. The question is: 
 How to
 configure a four node (or six node, an even number bigger then two) 
 corosync/pacemaker
 cluster to continue services if you have a blackout in one computing center 
 (you will
 always loose (at least) one half of your nodes), but to shutdown everything 
 if you have
 less then half of the node available. Are there any best practices on how to 
 deal with
 clusters in two computing centers? Anything like an external quorum node or a 
 quorum
 partition? I'd like to set the expected-quorum-votes to 3 but this is not 
 possible
 (with corosync-1.2.6 and pacemaker-1.1.2 on SLES11 SP1) Does anybody know why?
 Currently, the only way I can figure out is to run the cluster with 
 no-quorum-policy=ignore. But I don't like that. Any suggestions?
 
 
 Best regards
 
   Christoph

Hi,
I assume the only solution is to work with manual intervention, i.e. the
stonith meatware module.
Whenever a site goes down a human being has to confirm that it is lost,
pull the power cords or the inter-site links so it will not come back
unintentionally.

Then confirm with meatclient on the healthy site that the no longer
reachable site can be considered gone.

From theory this can be configured with an additional meatware stonith
resource with lower priority. The intention is to let your regular
stonith resources do the work with meatware as last resort.
Although I was not able to get this running with versions packaged with
SLES11 SP1. The priority was not honored and a lot of zombie meatware
processes were left over.
I found some patches in the upstream repositories that seem to address
these problems but I didn't follow up.

Regards
Holger


 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Split-site cluster in two locations

2010-12-28 Thread Michael Schwartzkopff
Hi,

I have four nodes in a split site scenario located in two computing centers. 
STONITH is enabled.

Is there and best practise how to deal with this setup? Does it make sense to 
set expected-quorum-votes to 3 to make the whole setup still running with 
one data center online? Is this possible at all?

Is quorum needed with STONITH enabled?

Is there a quorum server available already?

Thanks for any hints,

Greetings!

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98


signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Split-site cluster in two locations

2010-12-28 Thread Anton Altaparmakov
Hi,

On 28 Dec 2010, at 20:32, Michael Schwartzkopff wrote:
 Hi,
 
 I have four nodes in a split site scenario located in two computing centers. 
 STONITH is enabled.
 
 Is there and best practise how to deal with this setup? Does it make sense to 
 set expected-quorum-votes to 3 to make the whole setup still running with 
 one data center online? Is this possible at all?
 
 Is quorum needed with STONITH enabled?
 
 Is there a quorum server available already?

I couldn't see a quorum server in Pacemaker so I have installed a third dummy 
node which is not allowed to run any resources (using location constraints and 
setting the cluster to not be symmetric) which just acts as a third vote.  I am 
hoping this effectively acts as a quorum server as a node that looses 
connectivity will lose quorum and shut down its services whilst the other real 
node will retain connectivity and thus quorum due to the dummy node still being 
present.

Obviously this is quite wasteful of servers as you can only run a single 
Pacemaker instance on a server (as far as I know) so that is a lot of dummy 
servers when you run multiple pacemaker clusters...  Solution for us is to use 
virtualization - one physical server with VMs and each VM is a dummy node for a 
cluster...

Best regards,

Anton

 
 Thanks for any hints,
 
 Greetings!
 
 -- 
 Dr. Michael Schwartzkopff
 Guardinistr. 63
 81375 München
 
 Tel: (0163) 172 50 98
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Best regards,

Anton
-- 
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker