Re: [Pacemaker] no-quorum-policy = demote?
Done: http://bugs.clusterlabs.org/show_bug.cgi?id=5216 Best regards, Christianc 2014-05-27 22:51 GMT+02:00 Andrew Beekhof and...@beekhof.net: On 27 May 2014, at 7:20 pm, Christian Ciach derein...@gmail.com wrote: 2014-05-27 7:34 GMT+02:00 Andrew Beekhof and...@beekhof.net: On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote: On 05/27/14 08:07, Andrew Beekhof wrote: On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote: I am sorry to get back to this topic, but I'm genuinely curious: Why is demote an option for the ticket loss-policy for multi-site-clusters but not for the normal no-quorum-policy of local clusters? This seems like a missing feature to me. Or one feature too many. Perhaps Yan can explain why he wanted demote as an option for the loss-policy. Loss-policy=demote is a kind of natural default if the Master mode of a resource requires a ticket like: rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/ The idea is for running stateful resource instances across clusters. And loss-policy=demote provides the possibility if there's the need to still run the resource in slave mode for any reason when losing the ticket, rather than stopping it or fencing the node hosting it. I guess the same logic applies to the single cluster use-case too and we should allow no-quorum-policy=demote. Thank you for mentioning this. This was my thought as well. At the moment we simulate this behaviour by using a primitive resource where started means master and stopped means slave. This way we can use no-quorum-policy=stop to actually switch the resource to slave on quorum loss. This seems hacky, so I would appreciate if this could be done in a proper way some time in the future. Could you file a bug for that in bugs.clusterlabs.org so we don't loose track of it? One question though... do we still stop non-master/slave resources for loss-policy=demote? Regards, Yan Best regards Christian 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Gao,Yan y...@suse.com Software Engineer China Server Team, SUSE. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list:
Re: [Pacemaker] no-quorum-policy = demote?
2014-05-27 7:34 GMT+02:00 Andrew Beekhof and...@beekhof.net: On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote: On 05/27/14 08:07, Andrew Beekhof wrote: On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote: I am sorry to get back to this topic, but I'm genuinely curious: Why is demote an option for the ticket loss-policy for multi-site-clusters but not for the normal no-quorum-policy of local clusters? This seems like a missing feature to me. Or one feature too many. Perhaps Yan can explain why he wanted demote as an option for the loss-policy. Loss-policy=demote is a kind of natural default if the Master mode of a resource requires a ticket like: rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/ The idea is for running stateful resource instances across clusters. And loss-policy=demote provides the possibility if there's the need to still run the resource in slave mode for any reason when losing the ticket, rather than stopping it or fencing the node hosting it. I guess the same logic applies to the single cluster use-case too and we should allow no-quorum-policy=demote. Thank you for mentioning this. This was my thought as well. At the moment we simulate this behaviour by using a primitive resource where started means master and stopped means slave. This way we can use no-quorum-policy=stop to actually switch the resource to slave on quorum loss. This seems hacky, so I would appreciate if this could be done in a proper way some time in the future. One question though... do we still stop non-master/slave resources for loss-policy=demote? Regards, Yan Best regards Christian 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Gao,Yan y...@suse.com Software Engineer China Server Team, SUSE. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
On 27 May 2014, at 7:20 pm, Christian Ciach derein...@gmail.com wrote: 2014-05-27 7:34 GMT+02:00 Andrew Beekhof and...@beekhof.net: On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote: On 05/27/14 08:07, Andrew Beekhof wrote: On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote: I am sorry to get back to this topic, but I'm genuinely curious: Why is demote an option for the ticket loss-policy for multi-site-clusters but not for the normal no-quorum-policy of local clusters? This seems like a missing feature to me. Or one feature too many. Perhaps Yan can explain why he wanted demote as an option for the loss-policy. Loss-policy=demote is a kind of natural default if the Master mode of a resource requires a ticket like: rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/ The idea is for running stateful resource instances across clusters. And loss-policy=demote provides the possibility if there's the need to still run the resource in slave mode for any reason when losing the ticket, rather than stopping it or fencing the node hosting it. I guess the same logic applies to the single cluster use-case too and we should allow no-quorum-policy=demote. Thank you for mentioning this. This was my thought as well. At the moment we simulate this behaviour by using a primitive resource where started means master and stopped means slave. This way we can use no-quorum-policy=stop to actually switch the resource to slave on quorum loss. This seems hacky, so I would appreciate if this could be done in a proper way some time in the future. Could you file a bug for that in bugs.clusterlabs.org so we don't loose track of it? One question though... do we still stop non-master/slave resources for loss-policy=demote? Regards, Yan Best regards Christian 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Gao,Yan y...@suse.com Software Engineer China Server Team, SUSE. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started:
Re: [Pacemaker] no-quorum-policy = demote?
I am sorry to get back to this topic, but I'm genuinely curious: Why is demote an option for the ticket loss-policy for multi-site-clusters but not for the normal no-quorum-policy of local clusters? This seems like a missing feature to me. Best regards Christian 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote: I am sorry to get back to this topic, but I'm genuinely curious: Why is demote an option for the ticket loss-policy for multi-site-clusters but not for the normal no-quorum-policy of local clusters? This seems like a missing feature to me. Or one feature too many. Perhaps Yan can explain why he wanted demote as an option for the loss-policy. Best regards Christian 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org signature.asc Description: Message signed with OpenPGP using GPGMail ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
On 05/27/14 08:07, Andrew Beekhof wrote: On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote: I am sorry to get back to this topic, but I'm genuinely curious: Why is demote an option for the ticket loss-policy for multi-site-clusters but not for the normal no-quorum-policy of local clusters? This seems like a missing feature to me. Or one feature too many. Perhaps Yan can explain why he wanted demote as an option for the loss-policy. Loss-policy=demote is a kind of natural default if the Master mode of a resource requires a ticket like: rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/ The idea is for running stateful resource instances across clusters. And loss-policy=demote provides the possibility if there's the need to still run the resource in slave mode for any reason when losing the ticket, rather than stopping it or fencing the node hosting it. Regards, Yan Best regards Christian 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Gao,Yan y...@suse.com Software Engineer China Server Team, SUSE. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
On 27 May 2014, at 3:12 pm, Gao,Yan y...@suse.com wrote: On 05/27/14 08:07, Andrew Beekhof wrote: On 26 May 2014, at 10:47 pm, Christian Ciach derein...@gmail.com wrote: I am sorry to get back to this topic, but I'm genuinely curious: Why is demote an option for the ticket loss-policy for multi-site-clusters but not for the normal no-quorum-policy of local clusters? This seems like a missing feature to me. Or one feature too many. Perhaps Yan can explain why he wanted demote as an option for the loss-policy. Loss-policy=demote is a kind of natural default if the Master mode of a resource requires a ticket like: rsc_ticket rsc=ms1 rsc-role=Master ticket=ticketA/ The idea is for running stateful resource instances across clusters. And loss-policy=demote provides the possibility if there's the need to still run the resource in slave mode for any reason when losing the ticket, rather than stopping it or fencing the node hosting it. I guess the same logic applies to the single cluster use-case too and we should allow no-quorum-policy=demote. One question though... do we still stop non-master/slave resources for loss-policy=demote? Regards, Yan Best regards Christian 2014-04-07 9:54 GMT+02:00 Christian Ciach derein...@gmail.com: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Gao,Yan y...@suse.com Software Engineer China Server Team, SUSE. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org signature.asc Description: Message signed with OpenPGP using GPGMail ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
Le 10 avr. 2014 15:44, Christian Ciach derein...@gmail.com a écrit : I don't really like the idea to periodically poll crm_node -q for the current quorum state. No matter how frequently the monitor-function gets called, there will always be a small time frame where both nodes will be in the master state at the same time. Is there a way to get a notification to the OCF-agent whenever the quorum state changes? You should probably look for something like this in the ocfshellfunction.sh file. But also take a look at the page below, it has a lot of multi state dedicated variables that are most definitely useful in your case. http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_multi_state_proper_interpretation_of_notification_environment_variables.html 2014-04-08 10:14 GMT+02:00 Christian Ciach derein...@gmail.com: Interesting idea! I can confirm that this works. So, I need to monitor the output of crm_node -q to check if the current partition has quorum. If the partition doesn't have quorum, I need to set the location constraint according to your example. If the partition gets quorum again, I need to remove the constraint. This seems almost a bit hacky, but it should work okay. Thank you! It almost a shame that pacemaker doesn't have demote as a no-quorum-policy, but supports demote as a loss-policy for tickets. Yesterday I had another idea: Maybe I won't use a multistate resource agent but a primitive instead. This way, I will start the resource outside of pacemaker and let the start-action of the OCF-agent set the resource to master and the stop-action sets it to slave. Then I will just use no-quorum-policy=stop. The downside of this is that I cannot distinguish between a stopped resource and a resource in a slave state using crm_mon. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
Thank you for pointing me to the environment variables. Unfortunately, none of these work in this case. For example: Assume one node is currently the master. Then, because of a network failure, this node loses quorum. Because no-quorum-policy is set to ignore, this node will keep being a master. In this case there is no change of state, thus the notify-function of the OCF-agent does not get called by pacemaker. I've already tried this, so I am quite sure about that. 2014-04-11 8:16 GMT+02:00 Alexandre alxg...@gmail.com: Le 10 avr. 2014 15:44, Christian Ciach derein...@gmail.com a écrit : I don't really like the idea to periodically poll crm_node -q for the current quorum state. No matter how frequently the monitor-function gets called, there will always be a small time frame where both nodes will be in the master state at the same time. Is there a way to get a notification to the OCF-agent whenever the quorum state changes? You should probably look for something like this in the ocfshellfunction.sh file. But also take a look at the page below, it has a lot of multi state dedicated variables that are most definitely useful in your case. http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_multi_state_proper_interpretation_of_notification_environment_variables.html 2014-04-08 10:14 GMT+02:00 Christian Ciach derein...@gmail.com: Interesting idea! I can confirm that this works. So, I need to monitor the output of crm_node -q to check if the current partition has quorum. If the partition doesn't have quorum, I need to set the location constraint according to your example. If the partition gets quorum again, I need to remove the constraint. This seems almost a bit hacky, but it should work okay. Thank you! It almost a shame that pacemaker doesn't have demote as a no-quorum-policy, but supports demote as a loss-policy for tickets. Yesterday I had another idea: Maybe I won't use a multistate resource agent but a primitive instead. This way, I will start the resource outside of pacemaker and let the start-action of the OCF-agent set the resource to master and the stop-action sets it to slave. Then I will just use no-quorum-policy=stop. The downside of this is that I cannot distinguish between a stopped resource and a resource in a slave state using crm_mon. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
On Fri, Apr 11, 2014 at 10:02:59AM +0200, Christian Ciach wrote: Thank you for pointing me to the environment variables. Unfortunately, none of these work in this case. For example: Assume one node is currently the master. Then, because of a network failure, this node loses quorum. Because no-quorum-policy is set to ignore, this node will keep being a master. In this case there is no change of state, thus the notify-function of the OCF-agent does not get called by pacemaker. I've already tried this, so I am quite sure about that. Very very hackish idea: set monitor interval of the Master role to T seconds and fail (+demote) if no quorum. (or use a dummy resource agent similar to the ping RA, and update some node attribute from there... then have a contraint for the Master role on that node attribute) in your promote action, refuse to promote if no quorum sleep 3*T (+ time to demote) only then actually promote. That way, you are reasonably sure that, before you actually promote, the former master had a chance to notice quorum loss and demote. But you really should look into booth, or proper fencing. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
Thank your for another idea, but I think I will pass ;) I would like to use booth, but as I said, I also need location constraints based on an attribute-score (like ping). I don't think this is currently possible when using a multi-site-cluster. 2014-04-11 15:05 GMT+02:00 Lars Ellenberg lars.ellenb...@linbit.com: On Fri, Apr 11, 2014 at 10:02:59AM +0200, Christian Ciach wrote: Thank you for pointing me to the environment variables. Unfortunately, none of these work in this case. For example: Assume one node is currently the master. Then, because of a network failure, this node loses quorum. Because no-quorum-policy is set to ignore, this node will keep being a master. In this case there is no change of state, thus the notify-function of the OCF-agent does not get called by pacemaker. I've already tried this, so I am quite sure about that. Very very hackish idea: set monitor interval of the Master role to T seconds and fail (+demote) if no quorum. (or use a dummy resource agent similar to the ping RA, and update some node attribute from there... then have a contraint for the Master role on that node attribute) in your promote action, refuse to promote if no quorum sleep 3*T (+ time to demote) only then actually promote. That way, you are reasonably sure that, before you actually promote, the former master had a chance to notice quorum loss and demote. But you really should look into booth, or proper fencing. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
I don't really like the idea to periodically poll crm_node -q for the current quorum state. No matter how frequently the monitor-function gets called, there will always be a small time frame where both nodes will be in the master state at the same time. Is there a way to get a notification to the OCF-agent whenever the quorum state changes? 2014-04-08 10:14 GMT+02:00 Christian Ciach derein...@gmail.com: Interesting idea! I can confirm that this works. So, I need to monitor the output of crm_node -q to check if the current partition has quorum. If the partition doesn't have quorum, I need to set the location constraint according to your example. If the partition gets quorum again, I need to remove the constraint. This seems almost a bit hacky, but it should work okay. Thank you! It almost a shame that pacemaker doesn't have demote as a no-quorum-policy, but supports demote as a loss-policy for tickets. Yesterday I had another idea: Maybe I won't use a multistate resource agent but a primitive instead. This way, I will start the resource outside of pacemaker and let the start-action of the OCF-agent set the resource to master and the stop-action sets it to slave. Then I will just use no-quorum-policy=stop. The downside of this is that I cannot distinguish between a stopped resource and a resource in a slave state using crm_mon. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
Have you tried to patch the monitor action of your RA, so that it set the a temporary constraint location on the node to avoid it becoming master. Something like Location loc_splited_cluster -inf: MsRsc:Master $node Not sure about the above crm syntax, but that's the idea. Le 8 avr. 2014 02:52, Andrew Beekhof and...@beekhof.net a écrit : On 7 Apr 2014, at 5:54 pm, Christian Ciach derein...@gmail.com wrote: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! Ever? Including when you stop pacemaker? If so, maybe the path of least resistance is to delete the contents of the stop action in that OCF agent... This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
Well, I guess it would be okay to stop the resource when pacemaker stops, but the resource should never stop on quorum loss. This is what I wanted to say. 2014-04-08 2:51 GMT+02:00 Andrew Beekhof and...@beekhof.net: On 7 Apr 2014, at 5:54 pm, Christian Ciach derein...@gmail.com wrote: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! Ever? Including when you stop pacemaker? If so, maybe the path of least resistance is to delete the contents of the stop action in that OCF agent... This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
Interesting idea! I can confirm that this works. So, I need to monitor the output of crm_node -q to check if the current partition has quorum. If the partition doesn't have quorum, I need to set the location constraint according to your example. If the partition gets quorum again, I need to remove the constraint. This seems almost a bit hacky, but it should work okay. Thank you! It almost a shame that pacemaker doesn't have demote as a no-quorum-policy, but supports demote as a loss-policy for tickets. Yesterday I had another idea: Maybe I won't use a multistate resource agent but a primitive instead. This way, I will start the resource outside of pacemaker and let the start-action of the OCF-agent set the resource to master and the stop-action sets it to slave. Then I will just use no-quorum-policy=stop. The downside of this is that I cannot distinguish between a stopped resource and a resource in a slave state using crm_mon. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
Hello fine folks in Pacemaker land. Hopefully you could share your insight into this little problem for us. We have a intermittent problem with failover. two node cluster first node power is cut failover begins to second node first node reboots crm_mon -1 on the rebooted node is PENDING (never goes to ONLINE) Example output from vm5 Node lotus-4vm5: pending Online: [ lotus-4vm6 ] Example output from vm6 Online: [ lotus-4vm5 lotus-4vm6 ] Environment Centos 6.5 on KVM vms Pacemaker 1.1.10 Corosync 1.4.1 vm5 /var/log/messages Apr 8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:07 lotus-4vm5 crmd[1794]: notice: main: CRM Git Version: 368c726 Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20b6280 for attrd/0 Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 stonith-ng[1790]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:08 lotus-4vm5 attrd[1792]: notice: main: Starting mainloop... Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20ba600 for stonith-ng/0 Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20be980 for cib/0 Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Sending membership update 24 to cib Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: plugin_handle_membership: Membership 24: quorum acquired Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now member (was (null)) Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now member (was (null)) Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20c2d00 for crmd/0 Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Sending membership update 24 to crmd Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: get_node_name: Defaulting to uname -n for the local classic
Re: [Pacemaker] no-quorum-policy = demote?
Why did you re-ask the same question as a reply to the first question? You stonith is still failing. On 08/04/14 05:24 PM, Campbell, Gene wrote: Hello fine folks in Pacemaker land. Hopefully you could share your insight into this little problem for us. We have a intermittent problem with failover. two node cluster first node power is cut failover begins to second node first node reboots crm_mon -1 on the rebooted node is PENDING (never goes to ONLINE) Example output from vm5 Node lotus-4vm5: pending Online: [ lotus-4vm6 ] Example output from vm6 Online: [ lotus-4vm5 lotus-4vm6 ] Environment Centos 6.5 on KVM vms Pacemaker 1.1.10 Corosync 1.4.1 vm5 /var/log/messages Apr 8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:07 lotus-4vm5 crmd[1794]: notice: main: CRM Git Version: 368c726 Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20b6280 for attrd/0 Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 stonith-ng[1790]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:08 lotus-4vm5 attrd[1792]: notice: main: Starting mainloop... Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20ba600 for stonith-ng/0 Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20be980 for cib/0 Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Sending membership update 24 to cib Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: plugin_handle_membership: Membership 24: quorum acquired Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now member (was (null)) Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now member (was (null)) Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20c2d00 for crmd/0 Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info:
Re: [Pacemaker] no-quorum-policy = demote?
Yeah, sorry, been a long day. Basically, I replied to this question, so I could reuse the ML address, but then intended to change the subject. I figured it made no sense as a post to this thread, so I resent the way I had intended. Sorry for the confusion. Hopefully these messages will settle. Please forgive the slip up, I was in no way trying to double post, or be pushy. Thanks Gene On 4/8/14, 6:10 PM, Digimer li...@alteeve.ca wrote: Why did you re-ask the same question as a reply to the first question? You stonith is still failing. On 08/04/14 05:24 PM, Campbell, Gene wrote: Hello fine folks in Pacemaker land. Hopefully you could share your insight into this little problem for us. We have a intermittent problem with failover. two node cluster first node power is cut failover begins to second node first node reboots crm_mon -1 on the rebooted node is PENDING (never goes to ONLINE) Example output from vm5 Node lotus-4vm5: pending Online: [ lotus-4vm6 ] Example output from vm6 Online: [ lotus-4vm5 lotus-4vm6 ] Environment Centos 6.5 on KVM vms Pacemaker 1.1.10 Corosync 1.4.1 vm5 /var/log/messages Apr 8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:07 lotus-4vm5 crmd[1794]: notice: main: CRM Git Version: 368c726 Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20b6280 for attrd/0 Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:07 lotus-4vm5 stonith-ng[1790]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Apr 8 09:54:08 lotus-4vm5 attrd[1792]: notice: main: Starting mainloop... Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20ba600 for stonith-ng/0 Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Recorded connection 0x20be980 for cib/0 Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: Sending membership update 24 to cib Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: plugin_handle_membership: Membership 24: quorum acquired Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now member (was (null)) Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now member (was (null))
Re: [Pacemaker] no-quorum-policy = demote?
On 08/04/14 10:31 PM, Campbell, Gene wrote: Yeah, sorry, been a long day. Basically, I replied to this question, so I could reuse the ML address, but then intended to change the subject. I figured it made no sense as a post to this thread, so I resent the way I had intended. Sorry for the confusion. Hopefully these messages will settle. Please forgive the slip up, I was in no way trying to double post, or be pushy. Thanks Gene That's fine, but generally speaking, replying to an existing message to start a new thread is frowned upon. Many mail programs used the header data to tell when a message is in reply to another, and use that data to create message thread trees. So If you reply/change subject, it screws with people's mail threading. What I do is right-click - copy email address, start a fresh email - paste. Anyway, your stonith is broken. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
On 7 Apr 2014, at 5:54 pm, Christian Ciach derein...@gmail.com wrote: Hello, I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04 (daily builds until final release). My problem is as follows: I have a 2-node (plus a quorum-node) cluster to manage a multistate-resource. One node should be the master and the other one the slave. It is absolutely not allowed to have two masters at the same time. To prevent a split-brain situation, I am also using a third node as a quorum-only node (set to standby). There is no redundant connection because the nodes are connected over the internet. If one of the two nodes managing the resource becomes disconnected, it loses quorum. In this case, I want this resource to become a slave, but the resource should never be stopped completely! Ever? Including when you stop pacemaker? If so, maybe the path of least resistance is to delete the contents of the stop action in that OCF agent... This leaves me with a problem: no-quorum-policy=stop will stop the resource, while no-quorum-policy=ignore will keep this resource in a master-state. I already tried to demote the resource manually inside the monitor-action of the OCF-agent, but pacemaker will promote the resource immediately again. I am aware that I am trying the manage a multi-site-cluster and there is something like the booth-daemon, which sounds like the solution to my problem. But unfortunately I need the location-constraints of pacemaker based on the score of the OCF-agent. As far as I know location-constraints are not possible when using booth, because the 2-node-cluster is essentially split into two 1-node-clusters. Is this correct? To conclude: Is it possible to demote a resource on quorum loss instead of stopping it? Is booth an option if I need to manage the location of the master based on the score returned by the OCF-agent? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org signature.asc Description: Message signed with OpenPGP using GPGMail ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org