Re: [ClusterLabs] Users Digest, Vol 46, Issue 8

2018-11-09 Thread Ian Underhill
Yep all my pcs commands run on a live cluster. The design needs resources
to respond in specific ways before
moving on to other shutdown requests.

So it seems that these pcs commands that run on different nodes at the same
time, is the route cause of this issue,
anything that changes the live cib at the same time seems to cause
pacemaker to just skip\throw away actions that
that have been requested.

I have to admit this behaviour is very hard to work with. though in a
simple system using a shadow cib would avoid these issues,
that would suggest a central point of control anyway.

Luckily I have/can redesigned my approach to bring all the commands that
affect the live cib (on cluster shutdown\startup) to be run from a
single node within the cluster. (and added --waits to commands where
possible)

This approach removes all these issues, and things behave as expected.


On Fri, Nov 9, 2018 at 12:00 PM  wrote:

> Send Users mailing list submissions to
> users@clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@clusterlabs.org
>
> You can reach the person managing the list at
> users-ow...@clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
>
>
> Today's Topics:
>
>    1. Re: Pacemaker auto restarts disabled groups (Ian Underhill)
>2. Re: Pacemaker auto restarts disabled groups (Ken Gaillot)
>
>
> ------
>
> Message: 1
> Date: Thu, 8 Nov 2018 12:14:33 +
> From: Ian Underhill 
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups
> Message-ID:
> <
> cagu+cygddmthbv23+55ec40tjogeyzzukbl9o_ydjkqp+jo...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> seems this issue has been raised before, but has gone quite, with no
> solution
>
> https://lists.clusterlabs.org/pipermail/users/2017-October/006544.html
>
> I know my resource agents successfully return the correct status to the
> start\stop\monitor requests
>
> On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill 
> wrote:
>
> > Sometimes Im seeing that a resource group that is in the process of being
> > disable is auto restarted by pacemaker.
> >
> > When issuing pcs disable command to disable different resource groups at
> > the same time (on different nodes, at the group level) the result is that
> > sometimes the resource is stopped and restarted straight away. i'm using
> a
> > balanced placement strategy.
> >
> > looking into the daemon log, pacemaker is aborting transtions due to
> > config change of the meta attributes of target-role changing?
> >
> > Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3,
> > Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2):
> Stopped
> >
> > could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is
> > there a way of displaying Skipped actions?
> >
> > ive used crm_simulate --xml-file  -run to see the actions, and I see
> > this extra start request
> >
> > regards
> >
> > /Ian.
> >
> -- next part --
> An HTML attachment was scrubbed...
> URL: <
> https://lists.clusterlabs.org/pipermail/users/attachments/20181108/8e824615/attachment-0001.html
> >
>
> --
>
> Message: 2
> Date: Thu, 08 Nov 2018 10:58:52 -0600
> From: Ken Gaillot 
> To: Cluster Labs - All topics related to open-source clustering
> welcomed 
> Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups
> Message-ID: <1541696332.5197.3.ca...@redhat.com>
> Content-Type: text/plain; charset="UTF-8"
>
> On Thu, 2018-11-08 at 12:14 +, Ian Underhill wrote:
> > seems this issue has been raised before, but has gone quite, with no
> > solution
> >
> > https://lists.clusterlabs.org/pipermail/users/2017-October/006544.htm
> > l
>
> In that case, something appeared to be explicitly re-enabling the
> disabled resources. You can search your logs for "target-role" to see
> whether that's happening.
>
> > I know my resource agents successfully return the correct status to
> > the start\stop\monitor requests
> >
> > On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill  > m> wrote:
> > > Sometimes Im seeing that a resource group that is in the process of
> > > being disable is auto restarted

Re: [ClusterLabs] Pacemaker auto restarts disabled groups

2018-11-08 Thread Ian Underhill
seems this issue has been raised before, but has gone quite, with no
solution

https://lists.clusterlabs.org/pipermail/users/2017-October/006544.html

I know my resource agents successfully return the correct status to the
start\stop\monitor requests

On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill 
wrote:

> Sometimes Im seeing that a resource group that is in the process of being
> disable is auto restarted by pacemaker.
>
> When issuing pcs disable command to disable different resource groups at
> the same time (on different nodes, at the group level) the result is that
> sometimes the resource is stopped and restarted straight away. i'm using a
> balanced placement strategy.
>
> looking into the daemon log, pacemaker is aborting transtions due to
> config change of the meta attributes of target-role changing?
>
> Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3,
> Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): Stopped
>
> could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is
> there a way of displaying Skipped actions?
>
> ive used crm_simulate --xml-file  -run to see the actions, and I see
> this extra start request
>
> regards
>
> /Ian.
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker auto restarts disabled groups

2018-11-08 Thread Ian Underhill
Sometimes Im seeing that a resource group that is in the process of being
disable is auto restarted by pacemaker.

When issuing pcs disable command to disable different resource groups at
the same time (on different nodes, at the group level) the result is that
sometimes the resource is stopped and restarted straight away. i'm using a
balanced placement strategy.

looking into the daemon log, pacemaker is aborting transtions due to config
change of the meta attributes of target-role changing?

Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3, Incomplete=10,
Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): Stopped

could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is
there a way of displaying Skipped actions?

ive used crm_simulate --xml-file  -run to see the actions, and I see
this extra start request

regards

/Ian.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Colocation dependencies (dislikes)

2018-09-23 Thread Ian Underhill
Im trying to design a resource layout that has different "dislikes"
colocation scores between the various resources within the cluster.

1) When I start to have multiple colocation dependencies from a single
resource, strange behaviour starts to happen, in scenarios where resource
have to bunch together

consider the example (2 node system) 3 resources
C->B->A

constraints
B->A -10
C->B -INFINITY
C->A -10

So on paper I would expect A and C to run together and B to run on its own.
what you actually get is A and B running and C stopped?

crm_simulate -Ls says the score for C running on the same node as A is -10.
so why doesnt it start it?

Ideas?
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Q: Resource Groups vs Resources for stickiness and colocation?

2018-08-29 Thread Ian Underhill
im guessing this is just a "feature", but something that will probably stop
me using groups

Scenario1 (working):
1) Two nodes (1,2) within a cluster (default-stickiness = INFINITY)
2) Two resources (A,B) in a cluster running on different nodes
3) colocation constraint between resources of A->B score=-1

a) pcs standby node2, the resource B moves to node 1
b) pcs unstandby node2, the resource B stays on node 1 - this is good and
expected

Secanrio 2 (working):
1) exactly the same as above but the resource exist within their own group
(G1,G2)
2) the colocation constraint is between the groups

Secanrio 3 (not working):
1) Same as above however each group has two resources in them

 Resource Group: A_grp
 A (ocf::test:fallover): Started mac-devl03
 A_2 (ocf::test:fallover): Started mac-devl03
 Resource Group: B_grp
 B (ocf::test:fallover): Started mac-devl11
 B_2 (ocf::test:fallover): Started mac-devl11

a) pcs standby node2, the group moves to node 1
b) pcs unstandby node2, the group moves to node 2, but I have INFINITY
stickiness (maybe I need INFINITY+1 ;) )

crm_simulate -sL doesnt really explain why there is a difference.

any ideas?  (environment pacemaker-cluster-libs-1.1.16-12.el7.x86_64)

/Ian
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Understanding\Manually adjusting node-health-strategy

2018-07-26 Thread Ian Underhill
when a resource fails on a node I would like to mark the node unhealthy, so
other resources dont start up on it.

I believe I can achieve this, ignoring the concept of fencing at the moment.

I have tried to set my cluster to have a node-health-strategy as only_green.

However trying to manually adjust the nodes health, I believe I can set an
attribute #health on a node (see ref docs) but trying to set any attribute
#health fails?

# sudo crm_attribute --node myNode --name=#health --update=red
  Error setting #health=red (section=nodes, set=nodes-3): Update
does not conform to the configured schema Error performing operation:
Update does not conform to the configured schema

Im slightly surprised I dont get "something for free" regarding pacemaker
auto adjusting the health of a node when resources fail on it? am I missing
a setting or is this done by hand.

Thanks

Ian.

*ref docs*
http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-health.html
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-cluster-options.html
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] OCF Return codes OCF_NOT_RUNNING

2018-07-11 Thread Ian Underhill
im trying to understand the behaviour of pacemaker when a resource monitor
returns OCF_NOT_RUNNING instead of OCF_ERR_GENERIC, and does pacemaker
really care.

The documentation states that a return code OCF_NOT_RUNNING from a monitor
will not result in a stop being called on that resource, as it believes the
node is still clean.

https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html

This makes sense, however in practice is not what happens (unless im doing
something wrong :) )

When my resource returns OCF_NOT_RUNNING for a monitor call (after a start
has been performed) a stop is called.

if I have a resource threshold set >1,  i get start->monitor->stop cycle
until the threshold is consumed

/Ian.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker alert framework

2018-07-06 Thread Ian Underhill
requirement:
when a resource fails perform an actionm, run a script on all nodes within
the cluster, before the resource is relocated. i.e. information gathering
why the resource failed.

what I have looked into:
1) Use the monitor call within the resource to SSH to all nodes, again SSH
config needed.
2) Alert framework : this only seems to be triggered for nodes involved in
the relocation of the resource. i.e. if resource moves from node1 to node 2
node 3 doesnt know. so back to the SSH solution :(
3) sending a custom alert to all nodes in the cluster? is this possible?
not found a way?

only solution I have:
1) use SSH within an alert monitor (stop) to SSH onto all nodes to perform
the action, the nodes could be configured using the alert monitors
recipients, but I would still need to config SSH users and certs etc.
 1.a) this doesnt seem to be usable if the resource is relocated back
to the same node, as the alerts start\stop are run at the "same time". i.e
I need to delay the start till the SSH has completed.

what I would like:
1) delay the start\relocation of the resource until the information from
all nodes is complete, using only pacemaker behaviour\config

any ideas?

Thanks

/Ian.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org