On Mon, 2017-11-06 at 19:55 -0800, Aaron Cody wrote: > Hello > I have set up an active/passive HA NFS/DRBD cluster ... on RHEL7.2 > ... and I keep getting this 'Failed Action' message .. not always, > but sometimes.... : > > Stack: corosync > Current DC: ha-nfs2.lan.aaroncody.com (version 1.1.16-12.el7_4.4- > 94ff4df) - partition with quorum > Last updated: Mon Nov 6 22:52:28 2017 > Last change: Mon Nov 6 22:47:20 2017 by hacluster via crmd on ha- > nfs2.lan.aaroncody.com > > 2 nodes configured > 8 resources configured > > Online: [ ha-nfs1.lan.aaroncody.com ha-nfs2.lan.aaroncody.com ] > > Full list of resources: > > Master/Slave Set: nfs-drbd-clone [nfs-drbd] > Masters: [ ha-nfs2.lan.aaroncody.com ] > Slaves: [ ha-nfs1.lan.aaroncody.com ] > nfs-filesystem (ocf::heartbeat:Filesystem): Started ha- > nfs2.lan.aaroncody.com > nfs-root (ocf::heartbeat:exportfs): Started ha- > nfs2.lan.aaroncody.com > nfs-export1 (ocf::heartbeat:exportfs): Started ha- > nfs2.lan.aaroncody.com > nfs-server (ocf::heartbeat:nfsserver): Started ha- > nfs2.lan.aaroncody.com > nfs-ip (ocf::heartbeat:IPaddr2): Started ha- > nfs2.lan.aaroncody.com > nfs-notify (ocf::heartbeat:nfsnotify): Started ha- > nfs2.lan.aaroncody.com > > Failed Actions: > * nfs-server_start_0 on ha-nfs1.lan.aaroncody.com 'unknown error' > (1): call=40, status=complete, exitreason='Failed to start NFS server > locking daemons', > last-rc-change='Mon Nov 6 22:47:25 2017', queued=0ms, exec=202ms > > > > So, even though I have all my constraints set up to bring everything > up on the DRBD master, it seems to still insist on trying to start > NFS Server on the slave... > > Here are my constraints: > > Location Constraints: > Ordering Constraints: > promote nfs-drbd-clone then start nfs-filesystem (kind:Mandatory) > start nfs-filesystem then start nfs-ip (kind:Mandatory) > start nfs-ip then start nfs-server (kind:Mandatory) > start nfs-server then start nfs-notify (kind:Mandatory) > start nfs-server then start nfs-root (kind:Mandatory) > start nfs-server then start nfs-export1 (kind:Mandatory) > Colocation Constraints: > nfs-filesystem with nfs-drbd-clone (score:INFINITY) (with-rsc- > role:Master) > nfs-ip with nfs-filesystem (score:INFINITY) > nfs-server with nfs-ip (score:INFINITY) > nfs-root with nfs-filesystem (score:INFINITY) > nfs-export1 with nfs-filesystem (score:INFINITY) > nfs-notify with nfs-server (score:INFINITY) > > > any ideas what I'm doing wrong here? Did I mess up my constraints? > > TIA >
The constraints look good to me. To debug this sort of thing, I would grab the pe-input file from the transition that tried to start it wrongly, and use crm_simulate to get more information about it. crm_simulate is not very user-friendly, so if you can attach the pe- input file, I can take a look at it. (The pe-input will be listed at the end of the transition in the logs on the node that was DC at the time; you'll see a bunch of "pengine:" messages including one that the resource was scheduled for a start on that particular node.) -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org