18.06.2012 16:39, Phil Frost wrote: > I'm attempting to configure an NFS cluster, and I've observed that under > some failure conditions, resources that depend on a failed resource > simply stop, and no migration to another node is attempted, even though > a manual migration demonstrates the other node can run all resources, > and the resources will remain on the good node even after the migration > constraint is removed. > > I was able to reduce the configuration to this: > > node storage01 > node storage02 > primitive drbd_nfsexports ocf:pacemaker:Stateful > primitive fs_test ocf:pacemaker:Dummy > primitive vg_nfsexports ocf:pacemaker:Dummy > group test fs_test > ms drbd_nfsexports_ms drbd_nfsexports \ > meta master-max="1" master-node-max="1" \ > clone-max="2" clone-node-max="1" \ > notify="true" target-role="Started" > location l fs_test -inf: storage02 > colocation colo_drbd_master inf: ( test ) ( vg_nfsexports ) ( > drbd_nfsexports_ms:Master )
Sets (constraints with more then two members) are evaluated in the different order. Try colocation colo_drbd_master inf: ( drbd_nfsexports_ms:Master ) ( vg_nfsexports ) ( test ) > property $id="cib-bootstrap-options" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1339793579" > > The location constraint "l" exists only to demonstrate the problem; I > added it to simulate the NFS server being unrunnable on one node. > > To see the issue I'm experiencing, put storage01 in standby to force > everything on storage02. fs_test will not be able to run. Now bring > storage01, which can satisfy all the constraints, and see that no > migration takes place. Put storage02 in standby, and everything will > migrate to storage01 and start successfully. Take storage02 out of > standby, and the services remain on storage01. This demonstrates that > even though there is a clear "best" solution where all resources can > run, Pacemaker isn't finding it. > > So far, I've noticed any of the following changes will "fix" the problem: > > - removing colo_drbd_master > - removing any one resource from colo_drbd_master > - eliminating the group "test" and referencing fs_test directly in > constraints > - using a simple clone instead of a master/slave pair for > drbd_nfsexports_ms > > My current understanding is that if there exists a way to run all > resources, Pacemaker should find it and prefer it. Is that not the case? > Maybe I need to restructure my colocation constraint somehow? Obviously > this is a much reduced version of a more complex practical > configuration, so I'm trying to understand the underlying mechanisms > more than just the solution to this particular scenario. > > In particular, I'm not really sure how I inspect what Pacemaker is > thinking when it places resources. I've tried running crm_simulate -LRs, > but I'm a little bit unclear on how to interpret the results. In the > output, I do see this: > > drbd_nfsexports:1 promotion score on storage02: 10 > drbd_nfsexports:0 promotion score on storage01: 5 > > those numbers seem to account for the default stickiness of 1 for > master/slave resources, but don't seem to incorporate at all the > colocation constraints. Is that expected? > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org