On 11/08/2016 12:54 PM, Ryan Anstey wrote:
> I've been running a ceph cluster with pacemaker for a few months now.
> Everything has been working normally, but when I added a fourth node it
> won't work like the others, even though their OS is the same and the
> configs are all synced via salt. I also don't understand pacemaker that
> well since I followed a guide for it. If anyone could steer me in the
> right direction I would greatly appreciate it. Thank you!
>
> - My resources only start if the new node is the only active node.
> - Once started on new node, if they are moved back to one of the
> original nodes, it won't go back to the new one.
> - My resources work 100% if I start them manually (without pacemaker).
> - (In the logs/configs below, my resources are named "unifi",
> "rbd_unifi" being the main one that's not working.)
The key is all the location constraints starting with "cli-" in your
configuration. Such constraints were added automatically by command-line
tools, rather than added by you explicitly.
For example, Pacemaker has no concept of "moving" a resource. It places
all resources where they can best run, as specified by the
configuration. So, to move a resource, command-line tools add a location
constraint making the resource prefer a different node.
The downside is that the preference doesn't automatically go away. The
resource will continue to prefer the other node until you explicitly
remove the constraint.
Command-line tools that add such constraints generally provide some way
to clear them. If you clear all those constraints, resources will again
be placed on any node equally.
Separately, you also have a default resource stickiness of 100. That
means that even after you remove the constraints, resources that are
already running will tend to stay where they are. But if you stop and
start a resource, or add a new resource, it could start on a different node.
>
> Log when running cleaning up the resource on the NEW node:
>
> Nov 08 09:25:20 h4 Filesystem(fs_unifi)[18044]: WARNING: Couldn't find
> device [/dev/rbd/rbd/unifi]. Expected /dev/??? to exist
> Nov 08 09:25:20 h4 lrmd[3564]: notice: lxc_unifi_monitor_0:18018:stderr
> [ unifi doesn't exist ]
> Nov 08 09:25:20 h4 crmd[3567]: notice: Operation lxc_unifi_monitor_0:
> not running (node=h4, call=484, rc=7, cib-update=390, confirmed=true)
> Nov 08 09:25:20 h4 crmd[3567]: notice: h4-lxc_unifi_monitor_0:484 [
> unifi doesn't exist\n ]
> Nov 08 09:25:20 h4 crmd[3567]: notice: Operation fs_unifi_monitor_0: not
> running (node=h4, call=480, rc=7, cib-update=391, confirmed=true)
> Nov 08 09:25:20 h4 crmd[3567]: notice: Operation rbd_unifi_monitor_0:
> not running (node=h4, call=476, rc=7, cib-update=392, confirmed=true)
>
> Log when running cleaning up the resource on the OLD node:
>
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838209
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838210
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838212
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838209
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838209
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838210
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838212
> Nov 08 09:21:18 h3 crmd[11394]: notice: State transition S_IDLE ->
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838210
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838212
> Nov 08 09:21:18 h3 cib[11389]: warning: A-Sync reply to crmd failed: No
> message of desired type
> Nov 08 09:21:18 h3 crmd[11394]: warning: No match for shutdown action on
> 167838211
> Nov 08 09:21:22 h3 crmd[11394]: notice: Notifications disabled
> Nov 08 09:21:24 h3 crmd[11394]: notice: Notifications disabled
> Nov 08 09:21:24 h3 crmd[11394]: warning: No match for shutdown action on
> 167838211
> Nov 08 09:21:25 h3 crmd[11394]: notice: Notifications disabled
> Nov 08 09:21:25 h3 crmd[11394]: warning: No match for shutdown action on
> 167838211
> Nov 08 09:21:26 h3 pengine[11393]: notice: Start rbd_unifi(h3)
> Nov 08 09:21:26 h3 pengine[11393]: notice: Start fs_unifi(h3)
> Nov 08 09:21:26 h3 pengine[11393]: notice: Start lxc_unifi(h3)
> Nov 08 09:21:26 h3 pengine[11393]: notice: Calculated Transition 119:
> /var/lib/pacemaker/pengine/pe-input-739.bz2
> Nov 08 09:21:26 h3 crmd[11394]: notice: Processing graph 119
> (ref=pe_calc-dc-1478625686-648) derived from
> /var/lib/pacemaker/pengine/pe-input-739.bz2
> Nov 08 09:21:26 h3 crmd[11394]: notice: Initiating action 12: monitor
> rbd_unifi_monitor_0 on h4
> Nov 08 09:21:26 h3 crmd[11394]: notice: Initiating action 9: monitor
>