Re: [ClusterLabs] Users Digest, Vol 46, Issue 8
Yep all my pcs commands run on a live cluster. The design needs resources to respond in specific ways before moving on to other shutdown requests. So it seems that these pcs commands that run on different nodes at the same time, is the route cause of this issue, anything that changes the live cib at the same time seems to cause pacemaker to just skip\throw away actions that that have been requested. I have to admit this behaviour is very hard to work with. though in a simple system using a shadow cib would avoid these issues, that would suggest a central point of control anyway. Luckily I have/can redesigned my approach to bring all the commands that affect the live cib (on cluster shutdown\startup) to be run from a single node within the cluster. (and added --waits to commands where possible) This approach removes all these issues, and things behave as expected. On Fri, Nov 9, 2018 at 12:00 PM wrote: > Send Users mailing list submissions to > users@clusterlabs.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.clusterlabs.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@clusterlabs.org > > You can reach the person managing the list at > users-ow...@clusterlabs.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Users digest..." > > > Today's Topics: > > 1. Re: Pacemaker auto restarts disabled groups (Ian Underhill) >2. Re: Pacemaker auto restarts disabled groups (Ken Gaillot) > > > ------ > > Message: 1 > Date: Thu, 8 Nov 2018 12:14:33 + > From: Ian Underhill > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups > Message-ID: > < > cagu+cygddmthbv23+55ec40tjogeyzzukbl9o_ydjkqp+jo...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > seems this issue has been raised before, but has gone quite, with no > solution > > https://lists.clusterlabs.org/pipermail/users/2017-October/006544.html > > I know my resource agents successfully return the correct status to the > start\stop\monitor requests > > On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill > wrote: > > > Sometimes Im seeing that a resource group that is in the process of being > > disable is auto restarted by pacemaker. > > > > When issuing pcs disable command to disable different resource groups at > > the same time (on different nodes, at the group level) the result is that > > sometimes the resource is stopped and restarted straight away. i'm using > a > > balanced placement strategy. > > > > looking into the daemon log, pacemaker is aborting transtions due to > > config change of the meta attributes of target-role changing? > > > > Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3, > > Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): > Stopped > > > > could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is > > there a way of displaying Skipped actions? > > > > ive used crm_simulate --xml-file -run to see the actions, and I see > > this extra start request > > > > regards > > > > /Ian. > > > -- next part -- > An HTML attachment was scrubbed... > URL: < > https://lists.clusterlabs.org/pipermail/users/attachments/20181108/8e824615/attachment-0001.html > > > > -- > > Message: 2 > Date: Thu, 08 Nov 2018 10:58:52 -0600 > From: Ken Gaillot > To: Cluster Labs - All topics related to open-source clustering > welcomed > Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups > Message-ID: <1541696332.5197.3.ca...@redhat.com> > Content-Type: text/plain; charset="UTF-8" > > On Thu, 2018-11-08 at 12:14 +, Ian Underhill wrote: > > seems this issue has been raised before, but has gone quite, with no > > solution > > > > https://lists.clusterlabs.org/pipermail/users/2017-October/006544.htm > > l > > In that case, something appeared to be explicitly re-enabling the > disabled resources. You can search your logs for "target-role" to see > whether that's happening. > > > I know my resource agents successfully return the correct status to > > the start\stop\monitor requests > > > > On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill > m> wrote: > > > Sometimes Im seeing that a resource group that is in the process of > > > being disable is auto restarted
Re: [ClusterLabs] Pacemaker auto restarts disabled groups
seems this issue has been raised before, but has gone quite, with no solution https://lists.clusterlabs.org/pipermail/users/2017-October/006544.html I know my resource agents successfully return the correct status to the start\stop\monitor requests On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill wrote: > Sometimes Im seeing that a resource group that is in the process of being > disable is auto restarted by pacemaker. > > When issuing pcs disable command to disable different resource groups at > the same time (on different nodes, at the group level) the result is that > sometimes the resource is stopped and restarted straight away. i'm using a > balanced placement strategy. > > looking into the daemon log, pacemaker is aborting transtions due to > config change of the meta attributes of target-role changing? > > Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3, > Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): Stopped > > could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is > there a way of displaying Skipped actions? > > ive used crm_simulate --xml-file -run to see the actions, and I see > this extra start request > > regards > > /Ian. > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker auto restarts disabled groups
Sometimes Im seeing that a resource group that is in the process of being disable is auto restarted by pacemaker. When issuing pcs disable command to disable different resource groups at the same time (on different nodes, at the group level) the result is that sometimes the resource is stopped and restarted straight away. i'm using a balanced placement strategy. looking into the daemon log, pacemaker is aborting transtions due to config change of the meta attributes of target-role changing? Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3, Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): Stopped could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is there a way of displaying Skipped actions? ive used crm_simulate --xml-file -run to see the actions, and I see this extra start request regards /Ian. ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Colocation dependencies (dislikes)
Im trying to design a resource layout that has different "dislikes" colocation scores between the various resources within the cluster. 1) When I start to have multiple colocation dependencies from a single resource, strange behaviour starts to happen, in scenarios where resource have to bunch together consider the example (2 node system) 3 resources C->B->A constraints B->A -10 C->B -INFINITY C->A -10 So on paper I would expect A and C to run together and B to run on its own. what you actually get is A and B running and C stopped? crm_simulate -Ls says the score for C running on the same node as A is -10. so why doesnt it start it? Ideas? ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Q: Resource Groups vs Resources for stickiness and colocation?
im guessing this is just a "feature", but something that will probably stop me using groups Scenario1 (working): 1) Two nodes (1,2) within a cluster (default-stickiness = INFINITY) 2) Two resources (A,B) in a cluster running on different nodes 3) colocation constraint between resources of A->B score=-1 a) pcs standby node2, the resource B moves to node 1 b) pcs unstandby node2, the resource B stays on node 1 - this is good and expected Secanrio 2 (working): 1) exactly the same as above but the resource exist within their own group (G1,G2) 2) the colocation constraint is between the groups Secanrio 3 (not working): 1) Same as above however each group has two resources in them Resource Group: A_grp A (ocf::test:fallover): Started mac-devl03 A_2 (ocf::test:fallover): Started mac-devl03 Resource Group: B_grp B (ocf::test:fallover): Started mac-devl11 B_2 (ocf::test:fallover): Started mac-devl11 a) pcs standby node2, the group moves to node 1 b) pcs unstandby node2, the group moves to node 2, but I have INFINITY stickiness (maybe I need INFINITY+1 ;) ) crm_simulate -sL doesnt really explain why there is a difference. any ideas? (environment pacemaker-cluster-libs-1.1.16-12.el7.x86_64) /Ian ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Understanding\Manually adjusting node-health-strategy
when a resource fails on a node I would like to mark the node unhealthy, so other resources dont start up on it. I believe I can achieve this, ignoring the concept of fencing at the moment. I have tried to set my cluster to have a node-health-strategy as only_green. However trying to manually adjust the nodes health, I believe I can set an attribute #health on a node (see ref docs) but trying to set any attribute #health fails? # sudo crm_attribute --node myNode --name=#health --update=red Error setting #health=red (section=nodes, set=nodes-3): Update does not conform to the configured schema Error performing operation: Update does not conform to the configured schema Im slightly surprised I dont get "something for free" regarding pacemaker auto adjusting the health of a node when resources fail on it? am I missing a setting or is this done by hand. Thanks Ian. *ref docs* http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-health.html https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-cluster-options.html ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] OCF Return codes OCF_NOT_RUNNING
im trying to understand the behaviour of pacemaker when a resource monitor returns OCF_NOT_RUNNING instead of OCF_ERR_GENERIC, and does pacemaker really care. The documentation states that a return code OCF_NOT_RUNNING from a monitor will not result in a stop being called on that resource, as it believes the node is still clean. https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html This makes sense, however in practice is not what happens (unless im doing something wrong :) ) When my resource returns OCF_NOT_RUNNING for a monitor call (after a start has been performed) a stop is called. if I have a resource threshold set >1, i get start->monitor->stop cycle until the threshold is consumed /Ian. ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker alert framework
requirement: when a resource fails perform an actionm, run a script on all nodes within the cluster, before the resource is relocated. i.e. information gathering why the resource failed. what I have looked into: 1) Use the monitor call within the resource to SSH to all nodes, again SSH config needed. 2) Alert framework : this only seems to be triggered for nodes involved in the relocation of the resource. i.e. if resource moves from node1 to node 2 node 3 doesnt know. so back to the SSH solution :( 3) sending a custom alert to all nodes in the cluster? is this possible? not found a way? only solution I have: 1) use SSH within an alert monitor (stop) to SSH onto all nodes to perform the action, the nodes could be configured using the alert monitors recipients, but I would still need to config SSH users and certs etc. 1.a) this doesnt seem to be usable if the resource is relocated back to the same node, as the alerts start\stop are run at the "same time". i.e I need to delay the start till the SSH has completed. what I would like: 1) delay the start\relocation of the resource until the information from all nodes is complete, using only pacemaker behaviour\config any ideas? Thanks /Ian. ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org