Re: [ClusterLabs] Pacemaker failover failure

2015-07-14 Thread Digimer
As said before, fencing.

On 01/07/15 06:54 AM, alex austin wrote:
 so did another test:
 
 two nodes: node1 and node2
 
 Case: node1 is the active node
 node2: is pasive
 
 if I killall -9 pacemakerd corosync on node 1 the services do not fail
 over to node2, but if I start corosync and pacemaker on node1 then it
 fails over to node 2.
 
 Where am I mistaking?
 
 Alex
 
 On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com
 mailto:alexixa...@gmail.com wrote:
 
 Hi all,
 
 I have configured a virtual ip and redis in master-slave with
 corosync pacemaker. If redis fails, then the failover is successful,
 and redis gets promoted on the other node. However if pacemaker
 itself fails on the active node, the failover is not performed. Is
 there anything I missed in the configuration?
 
 Here's my configuration (i have hashed the ip address out):
 
 node host1.com http://host1.com
 
 nodehost2.com http://host2.com
 
 primitiveClusterIP IPaddr2 \
 
 paramsip=xxx.xxx.xxx.xxx cidr_netmask=23\
 
 opmonitor interval=1stimeout=20s\
 
 opstart interval=0timeout=20s\
 
 opstop interval=0timeout=20s\
 
 metais-managed=truetarget-role=Startedresource-stickiness=500
 
 primitiveredis redis \
 
 metatarget-role=Masteris-managed=true\
 
 opmonitor interval=1srole=Mastertimeout=5son-fail=restart
 
 msredis_clone redis\
 
 
 metanotify=trueis-managed=trueordered=falseinterleave=falseglobally-unique=falsetarget-role=Mastermigration-threshold=1
 
 colocationClusterIP-on-redis inf: ClusterIPredis_clone:Master
 
 colocationip-on-redis inf: ClusterIPredis_clone:Master
 
 propertycib-bootstrap-options: \
 
 dc-version=1.1.11-97629de\
 
 cluster-infrastructure=classic openais (with plugin)\
 
 expected-quorum-votes=2\
 
 stonith-enabled=false
 
 propertyredis_replication: \
 
 redis_REPL_INFO=host.com http://host.com
 
 
 thank you in advance
 
 
 Kind regards,
 
 
 Alex 
 
 
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-02 Thread alex austin
Thank you!

However, what is proper fencing in this situation?

Kind Regards,

Alex

On Wed, Jul 1, 2015 at 11:30 PM, Ken Gaillot kgail...@redhat.com wrote:

 On 07/01/2015 09:39 AM, alex austin wrote:
  This is what crm_mon shows
 
 
  Last updated: Wed Jul  1 10:35:40 2015
 
  Last change: Wed Jul  1 09:52:46 2015
 
  Stack: classic openais (with plugin)
 
  Current DC: host2 - partition with quorum
 
  Version: 1.1.11-97629de
 
  2 Nodes configured, 2 expected votes
 
  4 Resources configured
 
 
 
  Online: [ host1 host2 ]
 
 
  ClusterIP (ocf::heartbeat:IPaddr2): Started host2
 
   Master/Slave Set: redis_clone [redis]
 
   Masters: [ host2 ]
 
   Slaves: [ host1 ]
 
  pcmk-fencing(stonith:fence_pcmk):   Started host2
 
  On Wed, Jul 1, 2015 at 3:37 PM, alex austin alexixa...@gmail.com
 wrote:
 
  I am running version 1.4.7 of corosync

 If you can't upgrade to corosync 2 (which has many improvements), you'll
 need to set the no-quorum-policy=ignore cluster option.

 Proper fencing is necessary to avoid a split-brain situation, which can
 corrupt your data.

  On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot kgail...@redhat.com
 wrote:
 
  On 07/01/2015 08:57 AM, alex austin wrote:
  I have now configured stonith-enabled=true. What device should I use
 for
  fencing given the fact that it's a virtual machine but I don't have
  access
  to its configuration. would fence_pcmk do? if so, what parameters
  should I
  configure for it to work properly?
 
  No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
  CMAN to redirect its fencing requests to pacemaker.
 
  For a virtual machine, ideally you'd use fence_virtd running on the
  physical host, but I'm guessing from your comment that you can't do
  that. Does whoever provides your VM also provide an API for controlling
  it (starting/stopping/rebooting)?
 
  Regarding your original problem, it sounds like the surviving node
  doesn't have quorum. What version of corosync are you using? If you're
  using corosync 2, you need two_node: 1 in corosync.conf, in addition
  to configuring fencing in pacemaker.
 
  This is my new config:
 
 
  node dcwbpvmuas004.edc.nam.gm.com \
 
  attributes standby=off
 
  node dcwbpvmuas005.edc.nam.gm.com \
 
  attributes standby=off
 
  primitive ClusterIP IPaddr2 \
 
  params ip=198.208.86.242 cidr_netmask=23 \
 
  op monitor interval=1s timeout=20s \
 
  op start interval=0 timeout=20s \
 
  op stop interval=0 timeout=20s \
 
  meta is-managed=true target-role=Started
 resource-stickiness=500
 
  primitive pcmk-fencing stonith:fence_pcmk \
 
  params pcmk_host_list=dcwbpvmuas004.edc.nam.gm.com
  dcwbpvmuas005.edc.nam.gm.com \
 
  op monitor interval=10s \
 
  meta target-role=Started
 
  primitive redis redis \
 
  meta target-role=Master is-managed=true \
 
  op monitor interval=1s role=Master timeout=5s on-fail=restart
 
  ms redis_clone redis \
 
  meta notify=true is-managed=true ordered=false
 interleave=false
  globally-unique=false target-role=Master migration-threshold=1
 
  colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
 
  colocation ip-on-redis inf: ClusterIP redis_clone:Master
 
  colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
 
  property cib-bootstrap-options: \
 
  dc-version=1.1.11-97629de \
 
  cluster-infrastructure=classic openais (with plugin) \
 
  expected-quorum-votes=2 \
 
  stonith-enabled=true
 
  property redis_replication: \
 
  redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
 
  On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander 
  alexander.nekra...@emc.com wrote:
 
  stonith-enabled=false
 
  this might be the issue. The way peer node death is resolved, the
  surviving node must call STONITH on the peer. If it’s disabled it
  might not
  be able to resolve the event
 
 
 
  Alex
 
 
 
  *From:* alex austin [mailto:alexixa...@gmail.com]
  *Sent:* Wednesday, July 01, 2015 9:51 AM
  *To:* Users@clusterlabs.org
  *Subject:* Re: [ClusterLabs] Pacemaker failover failure
 
 
 
  So I noticed that if I kill redis on one node, it starts on the
 other,
  no
  problem, but if I actually kill pacemaker itself on one node, the
 other
  doesn't sense it so it doesn't fail over.
 
 
 
 
 
 
 
  On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com
  wrote:
 
  Hi all,
 
 
 
  I have configured a virtual ip and redis in master-slave with
 corosync
  pacemaker. If redis fails, then the failover is successful, and redis
  gets
  promoted on the other node. However if pacemaker itself fails on the
  active
  node, the failover is not performed. Is there anything I missed in
 the
  configuration?
 
 
 
  Here's my configuration (i have hashed the ip address out):
 
 
 
  node host1.com
 
  node host2.com
 
  primitive ClusterIP IPaddr2 \
 
  params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
 
  op

Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
so did another test:

two nodes: node1 and node2

Case: node1 is the active node
node2: is pasive

if I killall -9 pacemakerd corosync on node 1 the services do not fail over
to node2, but if I start corosync and pacemaker on node1 then it fails over
to node 2.

Where am I mistaking?

Alex

On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com wrote:

 Hi all,

 I have configured a virtual ip and redis in master-slave with corosync
 pacemaker. If redis fails, then the failover is successful, and redis gets
 promoted on the other node. However if pacemaker itself fails on the active
 node, the failover is not performed. Is there anything I missed in the
 configuration?

 Here's my configuration (i have hashed the ip address out):

 node host1.com

 node host2.com

 primitive ClusterIP IPaddr2 \

 params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1

 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

 colocation ip-on-redis inf: ClusterIP redis_clone:Master

 property cib-bootstrap-options: \

 dc-version=1.1.11-97629de \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false

 property redis_replication: \

 redis_REPL_INFO=host.com


 thank you in advance


 Kind regards,


 Alex

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread Ken Gaillot
On 07/01/2015 08:57 AM, alex austin wrote:
 I have now configured stonith-enabled=true. What device should I use for
 fencing given the fact that it's a virtual machine but I don't have access
 to its configuration. would fence_pcmk do? if so, what parameters should I
 configure for it to work properly?

No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
CMAN to redirect its fencing requests to pacemaker.

For a virtual machine, ideally you'd use fence_virtd running on the
physical host, but I'm guessing from your comment that you can't do
that. Does whoever provides your VM also provide an API for controlling
it (starting/stopping/rebooting)?

Regarding your original problem, it sounds like the surviving node
doesn't have quorum. What version of corosync are you using? If you're
using corosync 2, you need two_node: 1 in corosync.conf, in addition
to configuring fencing in pacemaker.

 This is my new config:
 
 
 node dcwbpvmuas004.edc.nam.gm.com \
 
 attributes standby=off
 
 node dcwbpvmuas005.edc.nam.gm.com \
 
 attributes standby=off
 
 primitive ClusterIP IPaddr2 \
 
 params ip=198.208.86.242 cidr_netmask=23 \
 
 op monitor interval=1s timeout=20s \
 
 op start interval=0 timeout=20s \
 
 op stop interval=0 timeout=20s \
 
 meta is-managed=true target-role=Started resource-stickiness=500
 
 primitive pcmk-fencing stonith:fence_pcmk \
 
 params pcmk_host_list=dcwbpvmuas004.edc.nam.gm.com
 dcwbpvmuas005.edc.nam.gm.com \
 
 op monitor interval=10s \
 
 meta target-role=Started
 
 primitive redis redis \
 
 meta target-role=Master is-managed=true \
 
 op monitor interval=1s role=Master timeout=5s on-fail=restart
 
 ms redis_clone redis \
 
 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1
 
 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
 
 colocation ip-on-redis inf: ClusterIP redis_clone:Master
 
 colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
 
 property cib-bootstrap-options: \
 
 dc-version=1.1.11-97629de \
 
 cluster-infrastructure=classic openais (with plugin) \
 
 expected-quorum-votes=2 \
 
 stonith-enabled=true
 
 property redis_replication: \
 
 redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
 
 On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander 
 alexander.nekra...@emc.com wrote:
 
 stonith-enabled=false

 this might be the issue. The way peer node death is resolved, the
 surviving node must call STONITH on the peer. If it’s disabled it might not
 be able to resolve the event



 Alex



 *From:* alex austin [mailto:alexixa...@gmail.com]
 *Sent:* Wednesday, July 01, 2015 9:51 AM
 *To:* Users@clusterlabs.org
 *Subject:* Re: [ClusterLabs] Pacemaker failover failure



 So I noticed that if I kill redis on one node, it starts on the other, no
 problem, but if I actually kill pacemaker itself on one node, the other
 doesn't sense it so it doesn't fail over.







 On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com wrote:

 Hi all,



 I have configured a virtual ip and redis in master-slave with corosync
 pacemaker. If redis fails, then the failover is successful, and redis gets
 promoted on the other node. However if pacemaker itself fails on the active
 node, the failover is not performed. Is there anything I missed in the
 configuration?



 Here's my configuration (i have hashed the ip address out):



 node host1.com

 node host2.com

 primitive ClusterIP IPaddr2 \

 params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1

 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

 colocation ip-on-redis inf: ClusterIP redis_clone:Master

 property cib-bootstrap-options: \

 dc-version=1.1.11-97629de \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false

 property redis_replication: \

 redis_REPL_INFO=host.com



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
So I noticed that if I kill redis on one node, it starts on the other, no
problem, but if I actually kill pacemaker itself on one node, the other
doesn't sense it so it doesn't fail over.



On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com wrote:

 Hi all,

 I have configured a virtual ip and redis in master-slave with corosync
 pacemaker. If redis fails, then the failover is successful, and redis gets
 promoted on the other node. However if pacemaker itself fails on the active
 node, the failover is not performed. Is there anything I missed in the
 configuration?

 Here's my configuration (i have hashed the ip address out):

 node host1.com

 node host2.com

 primitive ClusterIP IPaddr2 \

 params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1

 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

 colocation ip-on-redis inf: ClusterIP redis_clone:Master

 property cib-bootstrap-options: \

 dc-version=1.1.11-97629de \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false

 property redis_replication: \

 redis_REPL_INFO=host.com


 thank you in advance


 Kind regards,


 Alex

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
I am running version 1.4.7 of corosync



On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot kgail...@redhat.com wrote:

 On 07/01/2015 08:57 AM, alex austin wrote:
  I have now configured stonith-enabled=true. What device should I use for
  fencing given the fact that it's a virtual machine but I don't have
 access
  to its configuration. would fence_pcmk do? if so, what parameters should
 I
  configure for it to work properly?

 No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
 CMAN to redirect its fencing requests to pacemaker.

 For a virtual machine, ideally you'd use fence_virtd running on the
 physical host, but I'm guessing from your comment that you can't do
 that. Does whoever provides your VM also provide an API for controlling
 it (starting/stopping/rebooting)?

 Regarding your original problem, it sounds like the surviving node
 doesn't have quorum. What version of corosync are you using? If you're
 using corosync 2, you need two_node: 1 in corosync.conf, in addition
 to configuring fencing in pacemaker.

  This is my new config:
 
 
  node dcwbpvmuas004.edc.nam.gm.com \
 
  attributes standby=off
 
  node dcwbpvmuas005.edc.nam.gm.com \
 
  attributes standby=off
 
  primitive ClusterIP IPaddr2 \
 
  params ip=198.208.86.242 cidr_netmask=23 \
 
  op monitor interval=1s timeout=20s \
 
  op start interval=0 timeout=20s \
 
  op stop interval=0 timeout=20s \
 
  meta is-managed=true target-role=Started resource-stickiness=500
 
  primitive pcmk-fencing stonith:fence_pcmk \
 
  params pcmk_host_list=dcwbpvmuas004.edc.nam.gm.com
  dcwbpvmuas005.edc.nam.gm.com \
 
  op monitor interval=10s \
 
  meta target-role=Started
 
  primitive redis redis \
 
  meta target-role=Master is-managed=true \
 
  op monitor interval=1s role=Master timeout=5s on-fail=restart
 
  ms redis_clone redis \
 
  meta notify=true is-managed=true ordered=false interleave=false
  globally-unique=false target-role=Master migration-threshold=1
 
  colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
 
  colocation ip-on-redis inf: ClusterIP redis_clone:Master
 
  colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
 
  property cib-bootstrap-options: \
 
  dc-version=1.1.11-97629de \
 
  cluster-infrastructure=classic openais (with plugin) \
 
  expected-quorum-votes=2 \
 
  stonith-enabled=true
 
  property redis_replication: \
 
  redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
 
  On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander 
  alexander.nekra...@emc.com wrote:
 
  stonith-enabled=false
 
  this might be the issue. The way peer node death is resolved, the
  surviving node must call STONITH on the peer. If it’s disabled it might
 not
  be able to resolve the event
 
 
 
  Alex
 
 
 
  *From:* alex austin [mailto:alexixa...@gmail.com]
  *Sent:* Wednesday, July 01, 2015 9:51 AM
  *To:* Users@clusterlabs.org
  *Subject:* Re: [ClusterLabs] Pacemaker failover failure
 
 
 
  So I noticed that if I kill redis on one node, it starts on the other,
 no
  problem, but if I actually kill pacemaker itself on one node, the other
  doesn't sense it so it doesn't fail over.
 
 
 
 
 
 
 
  On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com
 wrote:
 
  Hi all,
 
 
 
  I have configured a virtual ip and redis in master-slave with corosync
  pacemaker. If redis fails, then the failover is successful, and redis
 gets
  promoted on the other node. However if pacemaker itself fails on the
 active
  node, the failover is not performed. Is there anything I missed in the
  configuration?
 
 
 
  Here's my configuration (i have hashed the ip address out):
 
 
 
  node host1.com
 
  node host2.com
 
  primitive ClusterIP IPaddr2 \
 
  params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
 
  op monitor interval=1s timeout=20s \
 
  op start interval=0 timeout=20s \
 
  op stop interval=0 timeout=20s \
 
  meta is-managed=true target-role=Started resource-stickiness=500
 
  primitive redis redis \
 
  meta target-role=Master is-managed=true \
 
  op monitor interval=1s role=Master timeout=5s on-fail=restart
 
  ms redis_clone redis \
 
  meta notify=true is-managed=true ordered=false interleave=false
  globally-unique=false target-role=Master migration-threshold=1
 
  colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
 
  colocation ip-on-redis inf: ClusterIP redis_clone:Master
 
  property cib-bootstrap-options: \
 
  dc-version=1.1.11-97629de \
 
  cluster-infrastructure=classic openais (with plugin) \
 
  expected-quorum-votes=2 \
 
  stonith-enabled=false
 
  property redis_replication: \
 
  redis_REPL_INFO=host.com



 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org

Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread Ken Gaillot
On 07/01/2015 09:39 AM, alex austin wrote:
 This is what crm_mon shows
 
 
 Last updated: Wed Jul  1 10:35:40 2015
 
 Last change: Wed Jul  1 09:52:46 2015
 
 Stack: classic openais (with plugin)
 
 Current DC: host2 - partition with quorum
 
 Version: 1.1.11-97629de
 
 2 Nodes configured, 2 expected votes
 
 4 Resources configured
 
 
 
 Online: [ host1 host2 ]
 
 
 ClusterIP (ocf::heartbeat:IPaddr2): Started host2
 
  Master/Slave Set: redis_clone [redis]
 
  Masters: [ host2 ]
 
  Slaves: [ host1 ]
 
 pcmk-fencing(stonith:fence_pcmk):   Started host2
 
 On Wed, Jul 1, 2015 at 3:37 PM, alex austin alexixa...@gmail.com wrote:
 
 I am running version 1.4.7 of corosync

If you can't upgrade to corosync 2 (which has many improvements), you'll
need to set the no-quorum-policy=ignore cluster option.

Proper fencing is necessary to avoid a split-brain situation, which can
corrupt your data.

 On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot kgail...@redhat.com wrote:

 On 07/01/2015 08:57 AM, alex austin wrote:
 I have now configured stonith-enabled=true. What device should I use for
 fencing given the fact that it's a virtual machine but I don't have
 access
 to its configuration. would fence_pcmk do? if so, what parameters
 should I
 configure for it to work properly?

 No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
 CMAN to redirect its fencing requests to pacemaker.

 For a virtual machine, ideally you'd use fence_virtd running on the
 physical host, but I'm guessing from your comment that you can't do
 that. Does whoever provides your VM also provide an API for controlling
 it (starting/stopping/rebooting)?

 Regarding your original problem, it sounds like the surviving node
 doesn't have quorum. What version of corosync are you using? If you're
 using corosync 2, you need two_node: 1 in corosync.conf, in addition
 to configuring fencing in pacemaker.

 This is my new config:


 node dcwbpvmuas004.edc.nam.gm.com \

 attributes standby=off

 node dcwbpvmuas005.edc.nam.gm.com \

 attributes standby=off

 primitive ClusterIP IPaddr2 \

 params ip=198.208.86.242 cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive pcmk-fencing stonith:fence_pcmk \

 params pcmk_host_list=dcwbpvmuas004.edc.nam.gm.com
 dcwbpvmuas005.edc.nam.gm.com \

 op monitor interval=10s \

 meta target-role=Started

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1

 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

 colocation ip-on-redis inf: ClusterIP redis_clone:Master

 colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master

 property cib-bootstrap-options: \

 dc-version=1.1.11-97629de \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=true

 property redis_replication: \

 redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com

 On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander 
 alexander.nekra...@emc.com wrote:

 stonith-enabled=false

 this might be the issue. The way peer node death is resolved, the
 surviving node must call STONITH on the peer. If it’s disabled it
 might not
 be able to resolve the event



 Alex



 *From:* alex austin [mailto:alexixa...@gmail.com]
 *Sent:* Wednesday, July 01, 2015 9:51 AM
 *To:* Users@clusterlabs.org
 *Subject:* Re: [ClusterLabs] Pacemaker failover failure



 So I noticed that if I kill redis on one node, it starts on the other,
 no
 problem, but if I actually kill pacemaker itself on one node, the other
 doesn't sense it so it doesn't fail over.







 On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com
 wrote:

 Hi all,



 I have configured a virtual ip and redis in master-slave with corosync
 pacemaker. If redis fails, then the failover is successful, and redis
 gets
 promoted on the other node. However if pacemaker itself fails on the
 active
 node, the failover is not performed. Is there anything I missed in the
 configuration?



 Here's my configuration (i have hashed the ip address out):



 node host1.com

 node host2.com

 primitive ClusterIP IPaddr2 \

 params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta