Re: [Pacemaker] no-quorum-policy = demote?

2014-05-28 Thread Christian Ciach
Done:

http://bugs.clusterlabs.org/show_bug.cgi?id=5216

Best regards,
Christianc


2014-05-27 22:51 GMT+02:00 Andrew Beekhof and...@beekhof.net:


 On 27 May 2014, at 7:20 pm, Christian Ciach derein...@gmail.com wrote:

 
 
 
  2014-05-27 7:34 GMT+02:00 Andrew Beekhof and...@beekhof.net:
 
  On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote:
 
   On 05/27/14 08:07, Andrew Beekhof wrote:
  
   On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com
 wrote:
  
   I am sorry to get back to this topic, but I'm genuinely curious:
  
   Why is demote an option for the ticket loss-policy for
 multi-site-clusters but not for the normal no-quorum-policy of local
 clusters? This seems like a missing feature to me.
  
   Or one feature too many.
   Perhaps Yan can explain why he wanted demote as an option for the
 loss-policy.
   Loss-policy=demote is a kind of natural default if the Master mode
   of a resource requires a ticket like:
   rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/
  
   The idea is for running stateful resource instances across clusters.
 And
   loss-policy=demote provides the possibility if there's the need to
   still run the resource in slave mode for any reason when losing the
   ticket, rather than stopping it or fencing the node hosting it.
 
  I guess the same logic applies to the single cluster use-case too and we
 should allow no-quorum-policy=demote.
 
 
  Thank you for mentioning this. This was my thought as well.
 
  At the moment we simulate this behaviour by using a primitive resource
 where started means master and stopped means slave. This way we can
 use no-quorum-policy=stop to actually switch the resource to slave on
 quorum loss. This seems hacky, so I would appreciate if this could be done
 in a proper way some time in the future.

 Could you file a bug for that in bugs.clusterlabs.org so we don't loose
 track of it?

 
  One question though... do we still stop non-master/slave resources for
 loss-policy=demote?
 
  
   Regards,
Yan
  
  
  
   Best regards
   Christian
  
  
   2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com:
   Hello,
  
   I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04
 (daily builds until final release).
  
   My problem is as follows: I have a 2-node (plus a quorum-node)
 cluster to manage a multistate-resource. One node should be the master and
 the other one the slave. It is absolutely not allowed to have two masters
 at the same time. To prevent a split-brain situation, I am also using a
 third node as a quorum-only node (set to standby). There is no redundant
 connection because the nodes are connected over the internet.
  
   If one of the two nodes managing the resource becomes disconnected,
 it loses quorum. In this case, I want this resource to become a slave, but
 the resource should never be stopped completely! This leaves me with a
 problem: no-quorum-policy=stop will stop the resource, while
 no-quorum-policy=ignore will keep this resource in a master-state. I
 already tried to demote the resource manually inside the monitor-action of
 the OCF-agent, but pacemaker will promote the resource immediately again.
  
   I am aware that I am trying the manage a multi-site-cluster and
 there is something like the booth-daemon, which sounds like the solution to
 my problem. But unfortunately I need the location-constraints of pacemaker
 based on the score of the OCF-agent. As far as I know location-constraints
 are not possible when using booth, because the 2-node-cluster is
 essentially split into two 1-node-clusters. Is this correct?
  
   To conclude: Is it possible to demote a resource on quorum loss
 instead of stopping it? Is booth an option if I need to manage the location
 of the master based on the score returned by the OCF-agent?
  
  
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
  
   --
   Gao,Yan y...@suse.com
   Software Engineer
   China Server Team, SUSE.
  
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Pacemaker mailing list: 

Re: [Pacemaker] no-quorum-policy = demote?

2014-05-27 Thread Christian Ciach
2014-05-27 7:34 GMT+02:00 Andrew Beekhof and...@beekhof.net:


 On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote:

  On 05/27/14 08:07, Andrew Beekhof wrote:
 
  On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com
 wrote:
 
  I am sorry to get back to this topic, but I'm genuinely curious:
 
  Why is demote an option for the ticket loss-policy for
 multi-site-clusters but not for the normal no-quorum-policy of local
 clusters? This seems like a missing feature to me.
 
  Or one feature too many.
  Perhaps Yan can explain why he wanted demote as an option for the
 loss-policy.
  Loss-policy=demote is a kind of natural default if the Master mode
  of a resource requires a ticket like:
  rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/
 
  The idea is for running stateful resource instances across clusters. And
  loss-policy=demote provides the possibility if there's the need to
  still run the resource in slave mode for any reason when losing the
  ticket, rather than stopping it or fencing the node hosting it.

 I guess the same logic applies to the single cluster use-case too and we
 should allow no-quorum-policy=demote.


Thank you for mentioning this. This was my thought as well.

At the moment we simulate this behaviour by using a primitive resource
where started means master and stopped means slave. This way we can
use no-quorum-policy=stop to actually switch the resource to slave on
quorum loss. This seems hacky, so I would appreciate if this could be done
in a proper way some time in the future.


 One question though... do we still stop non-master/slave resources for
 loss-policy=demote?

 
  Regards,
   Yan
 
 
 
  Best regards
  Christian
 
 
  2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com:
  Hello,
 
  I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04
 (daily builds until final release).
 
  My problem is as follows: I have a 2-node (plus a quorum-node) cluster
 to manage a multistate-resource. One node should be the master and the
 other one the slave. It is absolutely not allowed to have two masters at
 the same time. To prevent a split-brain situation, I am also using a third
 node as a quorum-only node (set to standby). There is no redundant
 connection because the nodes are connected over the internet.
 
  If one of the two nodes managing the resource becomes disconnected, it
 loses quorum. In this case, I want this resource to become a slave, but the
 resource should never be stopped completely! This leaves me with a problem:
 no-quorum-policy=stop will stop the resource, while
 no-quorum-policy=ignore will keep this resource in a master-state. I
 already tried to demote the resource manually inside the monitor-action of
 the OCF-agent, but pacemaker will promote the resource immediately again.
 
  I am aware that I am trying the manage a multi-site-cluster and there
 is something like the booth-daemon, which sounds like the solution to my
 problem. But unfortunately I need the location-constraints of pacemaker
 based on the score of the OCF-agent. As far as I know location-constraints
 are not possible when using booth, because the 2-node-cluster is
 essentially split into two 1-node-clusters. Is this correct?
 
  To conclude: Is it possible to demote a resource on quorum loss
 instead of stopping it? Is booth an option if I need to manage the location
 of the master based on the score returned by the OCF-agent?
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
  --
  Gao,Yan y...@suse.com
  Software Engineer
  China Server Team, SUSE.
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-05-27 Thread Andrew Beekhof

On 27 May 2014, at 7:20 pm, Christian Ciach derein...@gmail.com wrote:

 
 
 
 2014-05-27 7:34 GMT+02:00 Andrew Beekhof and...@beekhof.net:
 
 On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote:
 
  On 05/27/14 08:07, Andrew Beekhof wrote:
 
  On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote:
 
  I am sorry to get back to this topic, but I'm genuinely curious:
 
  Why is demote an option for the ticket loss-policy for 
  multi-site-clusters but not for the normal no-quorum-policy of local 
  clusters? This seems like a missing feature to me.
 
  Or one feature too many.
  Perhaps Yan can explain why he wanted demote as an option for the 
  loss-policy.
  Loss-policy=demote is a kind of natural default if the Master mode
  of a resource requires a ticket like:
  rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/
 
  The idea is for running stateful resource instances across clusters. And
  loss-policy=demote provides the possibility if there's the need to
  still run the resource in slave mode for any reason when losing the
  ticket, rather than stopping it or fencing the node hosting it.
 
 I guess the same logic applies to the single cluster use-case too and we 
 should allow no-quorum-policy=demote.
 
 
 Thank you for mentioning this. This was my thought as well.
 
 At the moment we simulate this behaviour by using a primitive resource 
 where started means master and stopped means slave. This way we can 
 use no-quorum-policy=stop to actually switch the resource to slave on 
 quorum loss. This seems hacky, so I would appreciate if this could be done in 
 a proper way some time in the future.

Could you file a bug for that in bugs.clusterlabs.org so we don't loose track 
of it?

  
 One question though... do we still stop non-master/slave resources for 
 loss-policy=demote?
 
 
  Regards,
   Yan
 
 
 
  Best regards
  Christian
 
 
  2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com:
  Hello,
 
  I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily 
  builds until final release).
 
  My problem is as follows: I have a 2-node (plus a quorum-node) cluster to 
  manage a multistate-resource. One node should be the master and the other 
  one the slave. It is absolutely not allowed to have two masters at the 
  same time. To prevent a split-brain situation, I am also using a third 
  node as a quorum-only node (set to standby). There is no redundant 
  connection because the nodes are connected over the internet.
 
  If one of the two nodes managing the resource becomes disconnected, it 
  loses quorum. In this case, I want this resource to become a slave, but 
  the resource should never be stopped completely! This leaves me with a 
  problem: no-quorum-policy=stop will stop the resource, while 
  no-quorum-policy=ignore will keep this resource in a master-state. I 
  already tried to demote the resource manually inside the monitor-action 
  of the OCF-agent, but pacemaker will promote the resource immediately 
  again.
 
  I am aware that I am trying the manage a multi-site-cluster and there is 
  something like the booth-daemon, which sounds like the solution to my 
  problem. But unfortunately I need the location-constraints of pacemaker 
  based on the score of the OCF-agent. As far as I know 
  location-constraints are not possible when using booth, because the 
  2-node-cluster is essentially split into two 1-node-clusters. Is this 
  correct?
 
  To conclude: Is it possible to demote a resource on quorum loss instead 
  of stopping it? Is booth an option if I need to manage the location of 
  the master based on the score returned by the OCF-agent?
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
  --
  Gao,Yan y...@suse.com
  Software Engineer
  China Server Team, SUSE.
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: 

Re: [Pacemaker] no-quorum-policy = demote?

2014-05-26 Thread Christian Ciach
I am sorry to get back to this topic, but I'm genuinely curious:

Why is demote an option for the ticket loss-policy for
multi-site-clusters but not for the normal no-quorum-policy of local
clusters? This seems like a missing feature to me.

Best regards
Christian


2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com:

 Hello,

 I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily
 builds until final release).

 My problem is as follows: I have a 2-node (plus a quorum-node) cluster to
 manage a multistate-resource. One node should be the master and the other
 one the slave. It is absolutely not allowed to have two masters at the same
 time. To prevent a split-brain situation, I am also using a third node as a
 quorum-only node (set to standby). There is no redundant connection because
 the nodes are connected over the internet.

 If one of the two nodes managing the resource becomes disconnected, it
 loses quorum. In this case, I want this resource to become a slave, but the
 resource should never be stopped completely! This leaves me with a problem:
 no-quorum-policy=stop will stop the resource, while
 no-quorum-policy=ignore will keep this resource in a master-state. I
 already tried to demote the resource manually inside the monitor-action of
 the OCF-agent, but pacemaker will promote the resource immediately again.

 I am aware that I am trying the manage a multi-site-cluster and there is
 something like the booth-daemon, which sounds like the solution to my
 problem. But unfortunately I need the location-constraints of pacemaker
 based on the score of the OCF-agent. As far as I know location-constraints
 are not possible when using booth, because the 2-node-cluster is
 essentially split into two 1-node-clusters. Is this correct?

 To conclude: Is it possible to demote a resource on quorum loss instead of
 stopping it? Is booth an option if I need to manage the location of the
 master based on the score returned by the OCF-agent?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-05-26 Thread Andrew Beekhof

On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote:

 I am sorry to get back to this topic, but I'm genuinely curious:
 
 Why is demote an option for the ticket loss-policy for 
 multi-site-clusters but not for the normal no-quorum-policy of local 
 clusters? This seems like a missing feature to me.

Or one feature too many.
Perhaps Yan can explain why he wanted demote as an option for the loss-policy.

 
 Best regards
 Christian
 
 
 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com:
 Hello,
 
 I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily 
 builds until final release).
 
 My problem is as follows: I have a 2-node (plus a quorum-node) cluster to 
 manage a multistate-resource. One node should be the master and the other one 
 the slave. It is absolutely not allowed to have two masters at the same time. 
 To prevent a split-brain situation, I am also using a third node as a 
 quorum-only node (set to standby). There is no redundant connection because 
 the nodes are connected over the internet.
 
 If one of the two nodes managing the resource becomes disconnected, it loses 
 quorum. In this case, I want this resource to become a slave, but the 
 resource should never be stopped completely! This leaves me with a problem: 
 no-quorum-policy=stop will stop the resource, while 
 no-quorum-policy=ignore will keep this resource in a master-state. I 
 already tried to demote the resource manually inside the monitor-action of 
 the OCF-agent, but pacemaker will promote the resource immediately again.
 
 I am aware that I am trying the manage a multi-site-cluster and there is 
 something like the booth-daemon, which sounds like the solution to my 
 problem. But unfortunately I need the location-constraints of pacemaker based 
 on the score of the OCF-agent. As far as I know location-constraints are not 
 possible when using booth, because the 2-node-cluster is essentially split 
 into two 1-node-clusters. Is this correct?
 
 To conclude: Is it possible to demote a resource on quorum loss instead of 
 stopping it? Is booth an option if I need to manage the location of the 
 master based on the score returned by the OCF-agent?
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-05-26 Thread Gao,Yan
On 05/27/14 08:07, Andrew Beekhof wrote:
 
 On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote:
 
 I am sorry to get back to this topic, but I'm genuinely curious:

 Why is demote an option for the ticket loss-policy for 
 multi-site-clusters but not for the normal no-quorum-policy of local 
 clusters? This seems like a missing feature to me.
 
 Or one feature too many.
 Perhaps Yan can explain why he wanted demote as an option for the loss-policy.
Loss-policy=demote is a kind of natural default if the Master mode
of a resource requires a ticket like:
rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/

The idea is for running stateful resource instances across clusters. And
loss-policy=demote provides the possibility if there's the need to
still run the resource in slave mode for any reason when losing the
ticket, rather than stopping it or fencing the node hosting it.

Regards,
  Yan

 

 Best regards
 Christian


 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com:
 Hello,

 I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily 
 builds until final release).

 My problem is as follows: I have a 2-node (plus a quorum-node) cluster to 
 manage a multistate-resource. One node should be the master and the other 
 one the slave. It is absolutely not allowed to have two masters at the same 
 time. To prevent a split-brain situation, I am also using a third node as a 
 quorum-only node (set to standby). There is no redundant connection because 
 the nodes are connected over the internet.

 If one of the two nodes managing the resource becomes disconnected, it loses 
 quorum. In this case, I want this resource to become a slave, but the 
 resource should never be stopped completely! This leaves me with a problem: 
 no-quorum-policy=stop will stop the resource, while 
 no-quorum-policy=ignore will keep this resource in a master-state. I 
 already tried to demote the resource manually inside the monitor-action of 
 the OCF-agent, but pacemaker will promote the resource immediately again.

 I am aware that I am trying the manage a multi-site-cluster and there is 
 something like the booth-daemon, which sounds like the solution to my 
 problem. But unfortunately I need the location-constraints of pacemaker 
 based on the score of the OCF-agent. As far as I know location-constraints 
 are not possible when using booth, because the 2-node-cluster is essentially 
 split into two 1-node-clusters. Is this correct?

 To conclude: Is it possible to demote a resource on quorum loss instead of 
 stopping it? Is booth an option if I need to manage the location of the 
 master based on the score returned by the OCF-agent?


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

-- 
Gao,Yan y...@suse.com
Software Engineer
China Server Team, SUSE.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-05-26 Thread Andrew Beekhof

On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote:

 On 05/27/14 08:07, Andrew Beekhof wrote:
 
 On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote:
 
 I am sorry to get back to this topic, but I'm genuinely curious:
 
 Why is demote an option for the ticket loss-policy for 
 multi-site-clusters but not for the normal no-quorum-policy of local 
 clusters? This seems like a missing feature to me.
 
 Or one feature too many.
 Perhaps Yan can explain why he wanted demote as an option for the 
 loss-policy.
 Loss-policy=demote is a kind of natural default if the Master mode
 of a resource requires a ticket like:
 rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/
 
 The idea is for running stateful resource instances across clusters. And
 loss-policy=demote provides the possibility if there's the need to
 still run the resource in slave mode for any reason when losing the
 ticket, rather than stopping it or fencing the node hosting it.

I guess the same logic applies to the single cluster use-case too and we should 
allow no-quorum-policy=demote.

One question though... do we still stop non-master/slave resources for 
loss-policy=demote?  

 
 Regards,
  Yan
 
 
 
 Best regards
 Christian
 
 
 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com:
 Hello,
 
 I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily 
 builds until final release).
 
 My problem is as follows: I have a 2-node (plus a quorum-node) cluster to 
 manage a multistate-resource. One node should be the master and the other 
 one the slave. It is absolutely not allowed to have two masters at the same 
 time. To prevent a split-brain situation, I am also using a third node as a 
 quorum-only node (set to standby). There is no redundant connection because 
 the nodes are connected over the internet.
 
 If one of the two nodes managing the resource becomes disconnected, it 
 loses quorum. In this case, I want this resource to become a slave, but the 
 resource should never be stopped completely! This leaves me with a problem: 
 no-quorum-policy=stop will stop the resource, while 
 no-quorum-policy=ignore will keep this resource in a master-state. I 
 already tried to demote the resource manually inside the monitor-action of 
 the OCF-agent, but pacemaker will promote the resource immediately again.
 
 I am aware that I am trying the manage a multi-site-cluster and there is 
 something like the booth-daemon, which sounds like the solution to my 
 problem. But unfortunately I need the location-constraints of pacemaker 
 based on the score of the OCF-agent. As far as I know location-constraints 
 are not possible when using booth, because the 2-node-cluster is 
 essentially split into two 1-node-clusters. Is this correct?
 
 To conclude: Is it possible to demote a resource on quorum loss instead of 
 stopping it? Is booth an option if I need to manage the location of the 
 master based on the score returned by the OCF-agent?
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 -- 
 Gao,Yan y...@suse.com
 Software Engineer
 China Server Team, SUSE.
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-11 Thread Alexandre
Le 10 avr. 2014 15:44, Christian Ciach derein...@gmail.com a écrit :

 I don't really like the idea to periodically poll crm_node -q for the
current quorum state. No matter how frequently the monitor-function gets
called, there will always be a small time frame where both nodes will be in
the master state at the same time.

 Is there a way to get a notification to the OCF-agent whenever the quorum
state changes?

You should probably look for something like this in the ocfshellfunction.sh
file.

But also take a look at the page below, it has a lot of multi state
dedicated variables that are most definitely useful in your case.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_multi_state_proper_interpretation_of_notification_environment_variables.html



 2014-04-08 10:14 GMT+02:00 Christian Ciach derein...@gmail.com:

 Interesting idea! I can confirm that this works. So, I need to monitor
the output of crm_node -q to check if the current partition has quorum.
If the partition doesn't have quorum, I need to set the location constraint
according to your example. If the partition gets quorum again, I need to
remove the constraint.

 This seems almost a bit hacky, but it should work okay. Thank you! It
almost a shame that pacemaker doesn't have demote as a
no-quorum-policy, but supports demote as a loss-policy for tickets.

 Yesterday I had another idea: Maybe I won't use a multistate resource
agent but a primitive instead. This way, I will start the resource outside
of pacemaker and let the start-action of the OCF-agent set the resource to
master and the stop-action sets it to slave. Then I will just use
no-quorum-policy=stop. The downside of this is that I cannot distinguish
between a stopped resource and a resource in a slave state using crm_mon.



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-11 Thread Christian Ciach
Thank you for pointing me to the environment variables. Unfortunately, none
of these work in this case. For example: Assume one node is currently the
master. Then, because of a network failure, this node loses quorum. Because
no-quorum-policy is set to ignore, this node will keep being a master.
In this case there is no change of state, thus the notify-function of the
OCF-agent does not get called by pacemaker. I've already tried this, so I
am quite sure about that.




2014-04-11 8:16 GMT+02:00 Alexandre alxg...@gmail.com:


 Le 10 avr. 2014 15:44, Christian Ciach derein...@gmail.com a écrit :

 
  I don't really like the idea to periodically poll crm_node -q for the
 current quorum state. No matter how frequently the monitor-function gets
 called, there will always be a small time frame where both nodes will be in
 the master state at the same time.
 
  Is there a way to get a notification to the OCF-agent whenever the
 quorum state changes?

 You should probably look for something like this in the
 ocfshellfunction.sh file.

 But also take a look at the page below, it has a lot of multi state
 dedicated variables that are most definitely useful in your case.


 http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_multi_state_proper_interpretation_of_notification_environment_variables.html

 
 
  2014-04-08 10:14 GMT+02:00 Christian Ciach derein...@gmail.com:
 
  Interesting idea! I can confirm that this works. So, I need to monitor
 the output of crm_node -q to check if the current partition has quorum.
 If the partition doesn't have quorum, I need to set the location constraint
 according to your example. If the partition gets quorum again, I need to
 remove the constraint.
 
  This seems almost a bit hacky, but it should work okay. Thank you! It
 almost a shame that pacemaker doesn't have demote as a
 no-quorum-policy, but supports demote as a loss-policy for tickets.
 
  Yesterday I had another idea: Maybe I won't use a multistate resource
 agent but a primitive instead. This way, I will start the resource outside
 of pacemaker and let the start-action of the OCF-agent set the resource to
 master and the stop-action sets it to slave. Then I will just use
 no-quorum-policy=stop. The downside of this is that I cannot distinguish
 between a stopped resource and a resource in a slave state using crm_mon.
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-11 Thread Lars Ellenberg
On Fri, Apr 11, 2014 at 10:02:59AM +0200, Christian Ciach wrote:
 Thank you for pointing me to the environment variables. Unfortunately, none
 of these work in this case. For example: Assume one node is currently the
 master. Then, because of a network failure, this node loses quorum. Because
 no-quorum-policy is set to ignore, this node will keep being a master.
 In this case there is no change of state, thus the notify-function of the
 OCF-agent does not get called by pacemaker. I've already tried this, so I
 am quite sure about that.


Very very hackish idea:

  set monitor interval of the Master role to T seconds
  and fail (+demote) if no quorum.

  (or use a dummy resource agent similar to the ping RA,
  and update some node attribute from there...
  then have a contraint for the Master role on that node attribute)

  in your promote action,
refuse to promote if no quorum
sleep 3*T (+ time to demote)
only then actually promote.

That way, you are reasonably sure that,
before you actually promote,
the former master had a chance to notice quorum loss and demote.

But you really should look into booth, or proper fencing.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-11 Thread Christian Ciach
Thank your for another idea, but I think I will pass ;)

I would like to use booth, but as I said, I also need location constraints
based on an attribute-score (like ping). I don't think this is currently
possible when using a multi-site-cluster.


2014-04-11 15:05 GMT+02:00 Lars Ellenberg lars.ellenb...@linbit.com:

 On Fri, Apr 11, 2014 at 10:02:59AM +0200, Christian Ciach wrote:
  Thank you for pointing me to the environment variables. Unfortunately,
 none
  of these work in this case. For example: Assume one node is currently the
  master. Then, because of a network failure, this node loses quorum.
 Because
  no-quorum-policy is set to ignore, this node will keep being a
 master.
  In this case there is no change of state, thus the notify-function of the
  OCF-agent does not get called by pacemaker. I've already tried this, so I
  am quite sure about that.


 Very very hackish idea:

   set monitor interval of the Master role to T seconds
   and fail (+demote) if no quorum.

   (or use a dummy resource agent similar to the ping RA,
   and update some node attribute from there...
   then have a contraint for the Master role on that node attribute)

   in your promote action,
 refuse to promote if no quorum
 sleep 3*T (+ time to demote)
 only then actually promote.

 That way, you are reasonably sure that,
 before you actually promote,
 the former master had a chance to notice quorum loss and demote.

 But you really should look into booth, or proper fencing.

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-10 Thread Christian Ciach
I don't really like the idea to periodically poll crm_node -q for the
current quorum state. No matter how frequently the monitor-function gets
called, there will always be a small time frame where both nodes will be in
the master state at the same time.

Is there a way to get a notification to the OCF-agent whenever the quorum
state changes?


2014-04-08 10:14 GMT+02:00 Christian Ciach derein...@gmail.com:

 Interesting idea! I can confirm that this works. So, I need to monitor the
 output of crm_node -q to check if the current partition has quorum. If
 the partition doesn't have quorum, I need to set the location constraint
 according to your example. If the partition gets quorum again, I need to
 remove the constraint.

 This seems almost a bit hacky, but it should work okay. Thank you! It
 almost a shame that pacemaker doesn't have demote as a
 no-quorum-policy, but supports demote as a loss-policy for tickets.

 Yesterday I had another idea: Maybe I won't use a multistate resource
 agent but a primitive instead. This way, I will start the resource outside
 of pacemaker and let the start-action of the OCF-agent set the resource to
 master and the stop-action sets it to slave. Then I will just use
 no-quorum-policy=stop. The downside of this is that I cannot distinguish
 between a stopped resource and a resource in a slave state using crm_mon.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-08 Thread Alexandre
Have you tried to patch the monitor action of your RA, so that it set the a
temporary constraint location on the node to avoid it becoming master.
Something like
Location loc_splited_cluster -inf: MsRsc:Master  $node

Not sure about the above crm syntax, but that's the idea.
 Le 8 avr. 2014 02:52, Andrew Beekhof and...@beekhof.net a écrit :


 On 7 Apr 2014, at 5:54 pm, Christian Ciach derein...@gmail.com wrote:

  Hello,
 
  I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily
 builds until final release).
 
  My problem is as follows: I have a 2-node (plus a quorum-node) cluster
 to manage a multistate-resource. One node should be the master and the
 other one the slave. It is absolutely not allowed to have two masters at
 the same time. To prevent a split-brain situation, I am also using a third
 node as a quorum-only node (set to standby). There is no redundant
 connection because the nodes are connected over the internet.
 
  If one of the two nodes managing the resource becomes disconnected, it
 loses quorum. In this case, I want this resource to become a slave, but the
 resource should never be stopped completely!

 Ever? Including when you stop pacemaker?  If so, maybe the path of least
 resistance is to delete the contents of the stop action in that OCF agent...

  This leaves me with a problem: no-quorum-policy=stop will stop the
 resource, while no-quorum-policy=ignore will keep this resource in a
 master-state. I already tried to demote the resource manually inside the
 monitor-action of the OCF-agent, but pacemaker will promote the resource
 immediately again.
 
  I am aware that I am trying the manage a multi-site-cluster and there is
 something like the booth-daemon, which sounds like the solution to my
 problem. But unfortunately I need the location-constraints of pacemaker
 based on the score of the OCF-agent. As far as I know location-constraints
 are not possible when using booth, because the 2-node-cluster is
 essentially split into two 1-node-clusters. Is this correct?
 
  To conclude: Is it possible to demote a resource on quorum loss instead
 of stopping it? Is booth an option if I need to manage the location of the
 master based on the score returned by the OCF-agent?
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-08 Thread Christian Ciach
Well, I guess it would be okay to stop the resource when pacemaker stops,
but the resource should never stop on quorum loss. This is what I wanted to
say.


2014-04-08 2:51 GMT+02:00 Andrew Beekhof and...@beekhof.net:


 On 7 Apr 2014, at 5:54 pm, Christian Ciach derein...@gmail.com wrote:

  Hello,
 
  I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily
 builds until final release).
 
  My problem is as follows: I have a 2-node (plus a quorum-node) cluster
 to manage a multistate-resource. One node should be the master and the
 other one the slave. It is absolutely not allowed to have two masters at
 the same time. To prevent a split-brain situation, I am also using a third
 node as a quorum-only node (set to standby). There is no redundant
 connection because the nodes are connected over the internet.
 
  If one of the two nodes managing the resource becomes disconnected, it
 loses quorum. In this case, I want this resource to become a slave, but the
 resource should never be stopped completely!

 Ever? Including when you stop pacemaker?  If so, maybe the path of least
 resistance is to delete the contents of the stop action in that OCF agent...

  This leaves me with a problem: no-quorum-policy=stop will stop the
 resource, while no-quorum-policy=ignore will keep this resource in a
 master-state. I already tried to demote the resource manually inside the
 monitor-action of the OCF-agent, but pacemaker will promote the resource
 immediately again.
 
  I am aware that I am trying the manage a multi-site-cluster and there is
 something like the booth-daemon, which sounds like the solution to my
 problem. But unfortunately I need the location-constraints of pacemaker
 based on the score of the OCF-agent. As far as I know location-constraints
 are not possible when using booth, because the 2-node-cluster is
 essentially split into two 1-node-clusters. Is this correct?
 
  To conclude: Is it possible to demote a resource on quorum loss instead
 of stopping it? Is booth an option if I need to manage the location of the
 master based on the score returned by the OCF-agent?
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-08 Thread Christian Ciach
Interesting idea! I can confirm that this works. So, I need to monitor the
output of crm_node -q to check if the current partition has quorum. If
the partition doesn't have quorum, I need to set the location constraint
according to your example. If the partition gets quorum again, I need to
remove the constraint.

This seems almost a bit hacky, but it should work okay. Thank you! It
almost a shame that pacemaker doesn't have demote as a
no-quorum-policy, but supports demote as a loss-policy for tickets.

Yesterday I had another idea: Maybe I won't use a multistate resource agent
but a primitive instead. This way, I will start the resource outside of
pacemaker and let the start-action of the OCF-agent set the resource to
master and the stop-action sets it to slave. Then I will just use
no-quorum-policy=stop. The downside of this is that I cannot distinguish
between a stopped resource and a resource in a slave state using crm_mon.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-08 Thread Campbell, Gene
Hello fine folks in Pacemaker land.   Hopefully you could share your insight 
into this little problem for us.

We have a intermittent problem with failover.

two node cluster
first node power is cut
failover begins to second node
first node reboots
crm_mon -1 on the rebooted node is  PENDING (never goes to ONLINE)

Example output from vm5
Node lotus-4vm5: pending
Online: [ lotus-4vm6 ]

Example output from vm6
Online: [ lotus-4vm5  lotus-4vm6 ]

Environment
Centos 6.5 on KVM vms
Pacemaker 1.1.10
Corosync 1.4.1

vm5 /var/log/messages
Apr  8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager
Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: main: Starting Pacemaker 
1.1.10-14.el6_5.2 (Build: 368c726):  generated-manpages agent-manpages 
ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios  
corosync-plugin cman
Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:07 lotus-4vm5 crmd[1794]:   notice: main: CRM Git Version: 368c726
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20b6280 for attrd/0
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:07 lotus-4vm5 stonith-ng[1790]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_cluster_connect: Connecting 
to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:08 lotus-4vm5 attrd[1792]:   notice: main: Starting mainloop...
Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20ba600 for stonith-ng/0
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20be980 for cib/0
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 24 to cib
Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: plugin_handle_membership: 
Membership 24: quorum acquired
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now member 
(was (null))
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now member 
(was (null))
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20c2d00 for crmd/0
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 24 to crmd
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name: Defaulting to 
uname -n for the local classic 

Re: [Pacemaker] no-quorum-policy = demote?

2014-04-08 Thread Digimer

Why did you re-ask the same question as a reply to the first question?

You stonith is still failing.

On 08/04/14 05:24 PM, Campbell, Gene wrote:

Hello fine folks in Pacemaker land.   Hopefully you could share your insight 
into this little problem for us.

We have a intermittent problem with failover.

two node cluster
first node power is cut
failover begins to second node
first node reboots
crm_mon -1 on the rebooted node is  PENDING (never goes to ONLINE)

Example output from vm5
Node lotus-4vm5: pending
Online: [ lotus-4vm6 ]

Example output from vm6
Online: [ lotus-4vm5  lotus-4vm6 ]

Environment
Centos 6.5 on KVM vms
Pacemaker 1.1.10
Corosync 1.4.1

vm5 /var/log/messages
Apr  8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager
Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: main: Starting Pacemaker 
1.1.10-14.el6_5.2 (Build: 368c726):  generated-manpages agent-manpages 
ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios  
corosync-plugin cman
Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:07 lotus-4vm5 crmd[1794]:   notice: main: CRM Git Version: 368c726
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20b6280 for attrd/0
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:07 lotus-4vm5 stonith-ng[1790]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_cluster_connect: Connecting 
to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Apr  8 09:54:08 lotus-4vm5 attrd[1792]:   notice: main: Starting mainloop...
Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20ba600 for stonith-ng/0
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20be980 for cib/0
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 24 to cib
Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: plugin_handle_membership: 
Membership 24: quorum acquired
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now member 
(was (null))
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now member 
(was (null))
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x20c2d00 for crmd/0
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: 

Re: [Pacemaker] no-quorum-policy = demote?

2014-04-08 Thread Campbell, Gene
Yeah, sorry, been a long day.  Basically, I replied to this question, so I
could reuse the ML address, but then intended to change the subject.  I
figured it made no sense as a post to this thread, so I resent the way I
had intended.

Sorry for the confusion.  Hopefully these messages will settle.  Please
forgive the slip up, I was in no way trying to double post, or be pushy.

Thanks
Gene


On 4/8/14, 6:10 PM, Digimer li...@alteeve.ca wrote:

Why did you re-ask the same question as a reply to the first question?

You stonith is still failing.

On 08/04/14 05:24 PM, Campbell, Gene wrote:
 Hello fine folks in Pacemaker land.   Hopefully you could share your
insight into this little problem for us.

 We have a intermittent problem with failover.

 two node cluster
 first node power is cut
 failover begins to second node
 first node reboots
 crm_mon -1 on the rebooted node is  PENDING (never goes to ONLINE)

 Example output from vm5
 Node lotus-4vm5: pending
 Online: [ lotus-4vm6 ]

 Example output from vm6
 Online: [ lotus-4vm5  lotus-4vm6 ]

 Environment
 Centos 6.5 on KVM vms
 Pacemaker 1.1.10
 Corosync 1.4.1

 vm5 /var/log/messages
 Apr  8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager
 Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: main: Starting
Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726):  generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
nagios  corosync-plugin cman
 Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
 Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
 Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
 Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
 Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
 Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
 Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
 Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
 Apr  8 09:54:07 lotus-4vm5 crmd[1794]:   notice: main: CRM Git Version:
368c726
 Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
 Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Recorded connection 0x20b6280 for attrd/0
 Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
 Apr  8 09:54:07 lotus-4vm5 stonith-ng[1790]:   notice:
crm_cluster_connect: Connecting to cluster infrastructure: classic
openais (with plugin)
 Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
 Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
 Apr  8 09:54:08 lotus-4vm5 attrd[1792]:   notice: main: Starting
mainloop...
 Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
 Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Recorded connection 0x20ba600 for stonith-ng/0
 Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
 Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Recorded connection 0x20be980 for cib/0
 Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Sending membership update 24 to cib
 Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
 Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
 Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice:
plugin_handle_membership: Membership 24: quorum acquired
 Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
member (was (null))
 Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now
member (was (null))
 

Re: [Pacemaker] no-quorum-policy = demote?

2014-04-08 Thread Digimer

On 08/04/14 10:31 PM, Campbell, Gene wrote:

Yeah, sorry, been a long day.  Basically, I replied to this question, so I
could reuse the ML address, but then intended to change the subject.  I
figured it made no sense as a post to this thread, so I resent the way I
had intended.

Sorry for the confusion.  Hopefully these messages will settle.  Please
forgive the slip up, I was in no way trying to double post, or be pushy.

Thanks
Gene


That's fine, but generally speaking, replying to an existing message to 
start a new thread is frowned upon. Many mail programs used the header 
data to tell when a message is in reply to another, and use that data to 
create message thread trees. So If you reply/change subject, it screws 
with people's mail threading.


What I do is right-click - copy email address, start a fresh email - 
paste.


Anyway, your stonith is broken.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-07 Thread Andrew Beekhof

On 7 Apr 2014, at 5:54 pm, Christian Ciach derein...@gmail.com wrote:

 Hello,
 
 I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily 
 builds until final release).
 
 My problem is as follows: I have a 2-node (plus a quorum-node) cluster to 
 manage a multistate-resource. One node should be the master and the other one 
 the slave. It is absolutely not allowed to have two masters at the same time. 
 To prevent a split-brain situation, I am also using a third node as a 
 quorum-only node (set to standby). There is no redundant connection because 
 the nodes are connected over the internet.
 
 If one of the two nodes managing the resource becomes disconnected, it loses 
 quorum. In this case, I want this resource to become a slave, but the 
 resource should never be stopped completely!

Ever? Including when you stop pacemaker?  If so, maybe the path of least 
resistance is to delete the contents of the stop action in that OCF agent...

 This leaves me with a problem: no-quorum-policy=stop will stop the 
 resource, while no-quorum-policy=ignore will keep this resource in a 
 master-state. I already tried to demote the resource manually inside the 
 monitor-action of the OCF-agent, but pacemaker will promote the resource 
 immediately again.
 
 I am aware that I am trying the manage a multi-site-cluster and there is 
 something like the booth-daemon, which sounds like the solution to my 
 problem. But unfortunately I need the location-constraints of pacemaker based 
 on the score of the OCF-agent. As far as I know location-constraints are not 
 possible when using booth, because the 2-node-cluster is essentially split 
 into two 1-node-clusters. Is this correct?
 
 To conclude: Is it possible to demote a resource on quorum loss instead of 
 stopping it? Is booth an option if I need to manage the location of the 
 master based on the score returned by the OCF-agent?
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org