Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-23 Thread David Pendell
Try this. digimer is an expert at what you are trying to do.

https://alteeve.ca/w/AN!Cluster_Tutorial_2

On Thu, Oct 23, 2014 at 1:05 PM, David Pendell losto...@gmail.com wrote:

 Try this.

 https://alteeve.ca/w/AN!Cluster_Tutorial_2

 On Wed, Oct 22, 2014 at 8:08 PM, Sihan Goi gois...@gmail.com wrote:

 Hi, can anyone help? Really stuck here...

 On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote:

 Hi,

 I'm following the Clusters from Scratch guide for Fedora 13, and I've
 managed to get a 2 node cluster working with Apache. However, once I tried
 to add DRBD 8.4 to the mix, it stopped working.

 I've followed the DRBD steps in the guide all the way till cib commit
 fs in Section 7.4, right before Testing Migration. However, when I do a
 crm_mon, I get the following failed actions.

 Last updated: Thu Oct 16 17:28:34 2014
 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
 Stack: cman
 Current DC: node02 - partition with quorum
 Version: 1.1.10-14.el6_5.3-368c726
 2 Nodes configured
 5 Resources configured


 Online: [ node01 node02 ]

 ClusterIP(ocf::heartbeat:IPaddr2):Started node02
  Master/Slave Set: WebDataClone [WebData]
  Masters: [ node02 ]
  Slaves: [ node01 ]
 WebFS   (ocf::heartbeat:Filesystem):Started node02

 Failed actions:
 WebSite_start_0 on node02 'unknown error' (1): call=278,
 status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014',
 queued=2ms, exec=0ms
 WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed
 Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms

 Seems like the apache Website resource isn't starting up. Apache was
 working just fine before I configured DRBD. What did I do wrong?

 --
 - Goi Sihan
 gois...@gmail.com




 --
 - Goi Sihan
 gois...@gmail.com

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-23 Thread David Pendell
Try this.

https://alteeve.ca/w/AN!Cluster_Tutorial_2

On Wed, Oct 22, 2014 at 8:08 PM, Sihan Goi gois...@gmail.com wrote:

 Hi, can anyone help? Really stuck here...

 On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote:

 Hi,

 I'm following the Clusters from Scratch guide for Fedora 13, and I've
 managed to get a 2 node cluster working with Apache. However, once I tried
 to add DRBD 8.4 to the mix, it stopped working.

 I've followed the DRBD steps in the guide all the way till cib commit
 fs in Section 7.4, right before Testing Migration. However, when I do a
 crm_mon, I get the following failed actions.

 Last updated: Thu Oct 16 17:28:34 2014
 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
 Stack: cman
 Current DC: node02 - partition with quorum
 Version: 1.1.10-14.el6_5.3-368c726
 2 Nodes configured
 5 Resources configured


 Online: [ node01 node02 ]

 ClusterIP(ocf::heartbeat:IPaddr2):Started node02
  Master/Slave Set: WebDataClone [WebData]
  Masters: [ node02 ]
  Slaves: [ node01 ]
 WebFS   (ocf::heartbeat:Filesystem):Started node02

 Failed actions:
 WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed
 Out, last-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms
 WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed
 Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms

 Seems like the apache Website resource isn't starting up. Apache was
 working just fine before I configured DRBD. What did I do wrong?

 --
 - Goi Sihan
 gois...@gmail.com




 --
 - Goi Sihan
 gois...@gmail.com

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-23 Thread David Pendell
By the way, you want to configure DRBD before you configure Apache. You
start from the bottom up. Get a fully working platform upon which to build.
Make sure that DRBD is working and that fencing is *in place and working*;
DON'T SKIP THIS! Then build Apache on top of that.

d.p.

On Thu, Oct 23, 2014 at 1:05 PM, David Pendell losto...@gmail.com wrote:

 Try this. digimer is an expert at what you are trying to do.

 https://alteeve.ca/w/AN!Cluster_Tutorial_2

 On Thu, Oct 23, 2014 at 1:05 PM, David Pendell losto...@gmail.com wrote:

 Try this.

 https://alteeve.ca/w/AN!Cluster_Tutorial_2

 On Wed, Oct 22, 2014 at 8:08 PM, Sihan Goi gois...@gmail.com wrote:

 Hi, can anyone help? Really stuck here...

 On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote:

 Hi,

 I'm following the Clusters from Scratch guide for Fedora 13, and I've
 managed to get a 2 node cluster working with Apache. However, once I tried
 to add DRBD 8.4 to the mix, it stopped working.

 I've followed the DRBD steps in the guide all the way till cib commit
 fs in Section 7.4, right before Testing Migration. However, when I do a
 crm_mon, I get the following failed actions.

 Last updated: Thu Oct 16 17:28:34 2014
 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
 Stack: cman
 Current DC: node02 - partition with quorum
 Version: 1.1.10-14.el6_5.3-368c726
 2 Nodes configured
 5 Resources configured


 Online: [ node01 node02 ]

 ClusterIP(ocf::heartbeat:IPaddr2):Started node02
  Master/Slave Set: WebDataClone [WebData]
  Masters: [ node02 ]
  Slaves: [ node01 ]
 WebFS   (ocf::heartbeat:Filesystem):Started node02

 Failed actions:
 WebSite_start_0 on node02 'unknown error' (1): call=278,
 status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014',
 queued=2ms, exec=0ms
 WebSite_start_0 on node01 'unknown error' (1): call=203,
 status=Timed
 Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms

 Seems like the apache Website resource isn't starting up. Apache was
 working just fine before I configured DRBD. What did I do wrong?

 --
 - Goi Sihan
 gois...@gmail.com




 --
 - Goi Sihan
 gois...@gmail.com

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Self-Fence???

2013-03-28 Thread David Pendell
I have a two-node CentOS 6.4 based cluster, using pacemaker 1.1.8 with a
cman backend running primarily libvirt controlled kvm VMs. For the VMs, I
am using clvm volumes for the virtual hard drives and a single gfs2 volume
for shared storage of the config files for the VMs and other shared data.
For fencing, I use ipmi and a apc master switch to provide redundant
fencing. There are location constraints that do not allow the fencing
resources run on their own node. I am *not* using sbd or any other software
based fencing device.

I had a very bizarre situation this morning -- I had one of the nodes
powered off. Then the other self-fenced. I thought that was impossible.

Excerpts from the logs:

Mar 28 13:10:01 virtualhost2 stonith-ng[4223]:   notice: remote_op_done:
Operation reboot of virtualhost2.delta-co.gov by
virtualhost1.delta-co.gov for crmd.4...@virtualhost1.delta-co.gov.fc5638ad:
Timer expired

[...]
Virtualhost1 was offline, so I expect that line.
[...]

Mar 28 13:13:30 virtualhost2 pengine[4226]:   notice: unpack_rsc_op:
Preventing p_ns2 from re-starting on virtualhost2.delta-co.gov: operation
monitor failed 'not installed' (rc=5)

[...]
If I had a brief interruption of my gfs2 volume, would that show up? And
would it be the cause of a fencing operation?
[...]

Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: pe_fence_node: Node
virtualhost2.delta-co.gov will be fenced to recover from resource failure(s)
Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: stage6: Scheduling
Node virtualhost2.delta-co.gov for STONITH

[...]
Why is it still trying to fence, if all of the fencing resources are
offline?
[...]

Mar 28 13:13:30 virtualhost2 crmd[4227]:   notice: te_fence_node: Executing
reboot fencing operation (43) on virtualhost2.delta-co.gov (timeout=6)

Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice: handle_request:
Client crmd.4227.9fdec3bd wants to fence (reboot) 'virtualhost2.delta-co.gov'
with device '(any)'

[...]
What does that mean? crmd.4227.9fdec3bd  I figure 4227 is a process number,
but I don't what the next number is.
[...]

Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:error:
check_alternate_host: No alternate host available to handle complex self
fencing request

[...]
Where did that come from?
[...]

Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
check_alternate_host: Peer[1] virtualhost1.delta-co.gov
Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
check_alternate_host: Peer[2] virtualhost2.delta-co.gov
Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
virtualhost2.delta-co.gov: 648ca743-6cda-4c81-9250-21c9109a51b9 (0)

[...]
The next logs are the reboot logs.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Self-Fence???

2013-03-28 Thread David Pendell
Ok, then. I learned something new. Thanks.

d.p.


On Thu, Mar 28, 2013 at 6:28 PM, Andrew Beekhof and...@beekhof.net wrote:

 On Fri, Mar 29, 2013 at 7:42 AM, David Pendell losto...@gmail.com wrote:
  I have a two-node CentOS 6.4 based cluster, using pacemaker 1.1.8 with a
  cman backend running primarily libvirt controlled kvm VMs. For the VMs,
 I am
  using clvm volumes for the virtual hard drives and a single gfs2 volume
 for
  shared storage of the config files for the VMs and other shared data. For
  fencing, I use ipmi and a apc master switch to provide redundant fencing.
  There are location constraints that do not allow the fencing resources
 run
  on their own node. I am *not* using sbd or any other software based
 fencing
  device.
 
  I had a very bizarre situation this morning -- I had one of the nodes
  powered off. Then the other self-fenced. I thought that was impossible.

 No. Not when a node is by itself.

 
  Excerpts from the logs:
 
  Mar 28 13:10:01 virtualhost2 stonith-ng[4223]:   notice: remote_op_done:
  Operation reboot of virtualhost2.delta-co.gov by
  virtualhost1.delta-co.gov for
 crmd.4...@virtualhost1.delta-co.gov.fc5638ad:
  Timer expired
 
  [...]
  Virtualhost1 was offline, so I expect that line.
  [...]
 
  Mar 28 13:13:30 virtualhost2 pengine[4226]:   notice: unpack_rsc_op:
  Preventing p_ns2 from re-starting on virtualhost2.delta-co.gov:
 operation
  monitor failed 'not installed' (rc=5)
 
  [...]
  If I had a brief interruption of my gfs2 volume, would that show up? And
  would it be the cause of a fencing operation?
  [...]
 
  Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: pe_fence_node: Node
  virtualhost2.delta-co.gov will be fenced to recover from resource
 failure(s)
  Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: stage6: Scheduling
  Node virtualhost2.delta-co.gov for STONITH
 
  [...]
  Why is it still trying to fence, if all of the fencing resources are
  offline?
  [...]
 
  Mar 28 13:13:30 virtualhost2 crmd[4227]:   notice: te_fence_node:
 Executing
  reboot fencing operation (43) on virtualhost2.delta-co.gov(timeout=6)
 
  Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice: handle_request:
  Client crmd.4227.9fdec3bd wants to fence (reboot)
  'virtualhost2.delta-co.gov' with device '(any)'
 
  [...]
  What does that mean? crmd.4227.9fdec3bd  I figure 4227 is a process
 number,
  but I don't what the next number is.
  [...]
 
  Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:error:
  check_alternate_host: No alternate host available to handle complex self
  fencing request
 
  [...]
  Where did that come from?

 It was scheduled by the policy engine (because a resource failed to
 stop by the looks of it) and, as per the logs above, initiated by the
 crmd.

  [...]
 
  Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
  check_alternate_host: Peer[1] virtualhost1.delta-co.gov
  Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
  check_alternate_host: Peer[2] virtualhost2.delta-co.gov
  Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
  initiate_remote_stonith_op: Initiating remote operation reboot for
  virtualhost2.delta-co.gov: 648ca743-6cda-4c81-9250-21c9109a51b9 (0)
 
  [...]
  The next logs are the reboot logs.
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Live migration question.

2012-07-31 Thread David Pendell
I'll try those; thanks.

d.p.

On Sun, Jul 29, 2012 at 9:02 PM, Andrew Beekhof and...@beekhof.net wrote:

 On Thu, Jul 12, 2012 at 2:26 PM, David Pendell losto...@gmail.com wrote:
  I have two cluster nodes that have a gigabit network between them for
 doing
  live migrations of running kvm VMs. If one of the two hosts go off line,
  naturally all of the guests then get restarted on the other host. But
 when
  the offline host then comes back online, all of the guests that were
  restarted on the online host then try to do a live migration. Given
 that I
  only have one gigabit link for doing the transfers, this creates a log
 jam.
  The result is that the VMs that timeout then do a move or rather
 shutdown
  VMs to restart them on the newly online node. For my Linux guests, this
 is
  annoying, but with a Windows VM it is a disaster, locking a very
 impatient
  dept out of their server for 3-4 minutes. ( One might think that this is
 a
  miracle, given that before I set up the cluster a server reboot would
 take
  ten minutes. Sigh. )
 
  I would like to make the migrations sequential so that the Windows VM can
  migrate first, and then the next most important Linux VMs, etc. Is there
 any
  way to do this?
 
  Beekhof suggested that there were several alternatives and that the
 mailing
  list would be the best place to ask.

 You could try creating an ordering constraint between the two VMs, or
 setting batch-limit really small.
 I think there were some other options too.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Live migration question.

2012-07-11 Thread David Pendell
I have two cluster nodes that have a gigabit network between them for doing
live migrations of running kvm VMs. If one of the two hosts go off line,
naturally all of the guests then get restarted on the other host. But when
the offline host then comes back online, all of the guests that were
restarted on the online host then try to do a live migration. Given that
I only have one gigabit link for doing the transfers, this creates a log
jam. The result is that the VMs that timeout then do a move or rather
shutdown VMs to restart them on the newly online node. For my Linux guests,
this is annoying, but with a Windows VM it is a disaster, locking a very
impatient dept out of their server for 3-4 minutes. ( One might think that
this is a miracle, given that before I set up the cluster a server reboot
would take ten minutes. Sigh. )

I would like to make the migrations sequential so that the Windows VM can
migrate first, and then the next most important Linux VMs, etc. Is there
any way to do this?

Beekhof suggested that there were several alternatives and that the mailing
list would be the best place to ask.

lostogre
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org