Re: [ClusterLabs] Resources wont start on new node unless it is the only active node

2016-11-08 Thread Ryan Anstey
I had a feeling it was something to do with that. It was confusing because
I could use the move command to move between my three original hosts, just
not the fourth. Then there was the device not found errors which added to
the confusion.

I have the resource stickiness set because I have critical things running
that I don't want to move around (such as a windows kvm) and I'd rather not
see any downtime from the reboots. (It tries to actually migrate kvm's but
it doesn't work, lxc vm's are stopped and started.)

I moved them to specific servers because I thought I could do a better job
at balancing the cluster on my own (based on knowing what my VM's were
capable of). I'm not sure if my methods are a good idea or not, just
sounded right to me at the time.

On Tue, Nov 8, 2016 at 2:00 PM Ken Gaillot  wrote:

> On 11/08/2016 12:54 PM, Ryan Anstey wrote:
> > I've been running a ceph cluster with pacemaker for a few months now.
> > Everything has been working normally, but when I added a fourth node it
> > won't work like the others, even though their OS is the same and the
> > configs are all synced via salt. I also don't understand pacemaker that
> > well since I followed a guide for it. If anyone could steer me in the
> > right direction I would greatly appreciate it. Thank you!
> >
> > - My resources only start if the new node is the only active node.
> > - Once started on new node, if they are moved back to one of the
> > original nodes, it won't go back to the new one.
> > - My resources work 100% if I start them manually (without pacemaker).
> > - (In the logs/configs below, my resources are named "unifi",
> > "rbd_unifi" being the main one that's not working.)
>
> The key is all the location constraints starting with "cli-" in your
> configuration. Such constraints were added automatically by command-line
> tools, rather than added by you explicitly.
>
> For example, Pacemaker has no concept of "moving" a resource. It places
> all resources where they can best run, as specified by the
> configuration. So, to move a resource, command-line tools add a location
> constraint making the resource prefer a different node.
>
> The downside is that the preference doesn't automatically go away. The
> resource will continue to prefer the other node until you explicitly
> remove the constraint.
>
> Command-line tools that add such constraints generally provide some way
> to clear them. If you clear all those constraints, resources will again
> be placed on any node equally.
>
> Separately, you also have a default resource stickiness of 100. That
> means that even after you remove the constraints, resources that are
> already running will tend to stay where they are. But if you stop and
> start a resource, or add a new resource, it could start on a different
> node.
>
> 
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resources wont start on new node unless it is the only active node

2016-11-08 Thread Ken Gaillot
On 11/08/2016 12:54 PM, Ryan Anstey wrote:
> I've been running a ceph cluster with pacemaker for a few months now.
> Everything has been working normally, but when I added a fourth node it
> won't work like the others, even though their OS is the same and the
> configs are all synced via salt. I also don't understand pacemaker that
> well since I followed a guide for it. If anyone could steer me in the
> right direction I would greatly appreciate it. Thank you!
> 
> - My resources only start if the new node is the only active node.
> - Once started on new node, if they are moved back to one of the
> original nodes, it won't go back to the new one.
> - My resources work 100% if I start them manually (without pacemaker).
> - (In the logs/configs below, my resources are named "unifi",
> "rbd_unifi" being the main one that's not working.)

The key is all the location constraints starting with "cli-" in your
configuration. Such constraints were added automatically by command-line
tools, rather than added by you explicitly.

For example, Pacemaker has no concept of "moving" a resource. It places
all resources where they can best run, as specified by the
configuration. So, to move a resource, command-line tools add a location
constraint making the resource prefer a different node.

The downside is that the preference doesn't automatically go away. The
resource will continue to prefer the other node until you explicitly
remove the constraint.

Command-line tools that add such constraints generally provide some way
to clear them. If you clear all those constraints, resources will again
be placed on any node equally.

Separately, you also have a default resource stickiness of 100. That
means that even after you remove the constraints, resources that are
already running will tend to stay where they are. But if you stop and
start a resource, or add a new resource, it could start on a different node.

> 
> Log when running cleaning up the resource on the NEW node:
> 
> Nov 08 09:25:20 h4 Filesystem(fs_unifi)[18044]: WARNING: Couldn't find
> device [/dev/rbd/rbd/unifi]. Expected /dev/??? to exist
> Nov 08 09:25:20 h4 lrmd[3564]: notice: lxc_unifi_monitor_0:18018:stderr
> [ unifi doesn't exist ]
> Nov 08 09:25:20 h4 crmd[3567]: notice: Operation lxc_unifi_monitor_0:
> not running (node=h4, call=484, rc=7, cib-update=390, confirmed=true)
> Nov 08 09:25:20 h4 crmd[3567]: notice: h4-lxc_unifi_monitor_0:484 [
> unifi doesn't exist\n ]
> Nov 08 09:25:20 h4 crmd[3567]: notice: Operation fs_unifi_monitor_0: not
> running (node=h4, call=480, rc=7, cib-update=391, confirmed=true)
> Nov 08 09:25:20 h4 crmd[3567]: notice: Operation rbd_unifi_monitor_0:
> not running (node=h4, call=476, rc=7, cib-update=392, confirmed=true)
> 
> Log when running cleaning up the resource on the OLD node:  
> 
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838209
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838210
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838212
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838209
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838209
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838210
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838212
> Nov 08 09:21:18 h3 crmd[11394]: notice: State transition S_IDLE ->
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838210
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838212
> Nov 08 09:21:18 h3 cib[11389]: warning: A-Sync reply to crmd failed: No
> message of desired type
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838211
> Nov 08 09:21:22 h3 crmd[11394]: notice: Notifications disabled
> Nov 08 09:21:24 h3 crmd[11394]: notice: Notifications disabled
> Nov 08 09:21:24 h3 crmd[11394]: warning: No match for shutdown action on
> 167838211
> Nov 08 09:21:25 h3 crmd[11394]: notice: Notifications disabled
> Nov 08 09:21:25 h3 crmd[11394]: warning: No match for shutdown action on
> 167838211
> Nov 08 09:21:26 h3 pengine[11393]: notice: Start   rbd_unifi(h3)
> Nov 08 09:21:26 h3 pengine[11393]: notice: Start   fs_unifi(h3)
> Nov 08 09:21:26 h3 pengine[11393]: notice: Start   lxc_unifi(h3)
> Nov 08 09:21:26 h3 pengine[11393]: notice: Calculated Transition 119:
> /var/lib/pacemaker/pengine/pe-input-739.bz2
> Nov 08 09:21:26 h3 crmd[11394]: notice: Processing graph 119
> (ref=pe_calc-dc-1478625686-648) derived from
> /var/lib/pacemaker/pengine/pe-input-739.bz2
> Nov 08 09:21:26 h3 crmd[11394]: notice: Initiating action 12: monitor
> rbd_unifi_monitor_0 on h4
> Nov 08 09:21:26 h3 crmd[11394]: notice: Initiating action 9: monitor
>