----- Original Message ----- > From: "Andrew Beekhof" <and...@beekhof.net> > To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Sent: Wednesday, October 30, 2013 1:08:12 AM > Subject: Re: [Pacemaker] Asymmetric cluster, clones, and location constraints > > > On 25 Oct 2013, at 9:40 am, David Vossel <dvos...@redhat.com> wrote: > > > > > > > > > > > ----- Original Message ----- > >> From: "Lindsay Todd" <rltodd....@gmail.com> > >> To: "The Pacemaker cluster resource manager" > >> <pacemaker@oss.clusterlabs.org> > >> Sent: Wednesday, October 23, 2013 2:38:17 PM > >> Subject: Re: [Pacemaker] Asymmetric cluster, clones, and location > >> constraints > >> > >> David, > >> > >> The Infiniband network takes a nondeterministic amount of time to actually > >> finish initializing, so we use ethmonitor to watch it; the OS is supposed > >> to > >> bring it up at boot time, but it moves on through the boot sequence > >> without > >> actually waiting for it. So in self defense we watch it with pacemaker. I > >> guess I could restructure this to use a resource that brings up IB (with a > >> really long time out) and use ordering to wait for that complete, but it > >> seems that ethmonitor would be more adaptive to short-term IB network > >> issues. Since ethmonitor works by setting an attribute (the RA running > >> means > >> it is watching the network, not that the network is up), I've used > >> location > >> constraints instead of ordering constraints. > >> > >> So I have completely restarted my cluster. Right now the physical nodes > >> see > >> each other, and the fencing agents are running. The first thing that > >> should > >> start are the ethmonitor resource agents on the VM hosts (the c-watch-ib0 > >> clones of the p-watch-ib0 primitive). They are not starting (like they > >> used > >> to). > > > > I see. Your cib generates an invalid transition. I'll try and look into > > it in more detail soon to understand the cause. > > According to git bisect, the winner is:
I always knew I was a winner > > 15a86e501a57b50fdb3b8ce0ed432b183c343c74 is the first bad commit > commit 15a86e501a57b50fdb3b8ce0ed432b183c343c74 > Author: David Vossel <dvos...@redhat.com> > Date: Mon Sep 23 18:55:21 2013 -0500 > > High: pengine: Probe container nodes > > > I'll take a look in the morning unless David beats me to it :-) This is a tough one. I enabled probing container nodes, but didn't anticipate the scenario where there's an ordering constraint involving a container node's "container resource" (the VM). I have an idea of how to fix this, but the end result might make probing containers is useless. I'll give this some thought. Until then, there is a real easy workaround for this. Set the 'enable-container-probes' global config option to "false" -- Vossel > > > > One completely unrelated thought I had while looking at your config > > involves your fencing agents. You shouldn't have to use location > > constraints at on the fencing agents. I believe stonith is smart enough > > now to execute the agent on a node that isn't the target regardless of > > where the policy engine puts it. > > > > -- Vossel > > > >> The cib snapshot can be seen in http://pastebin.com/TccTHQPS (some > >> slight editing to hide passwords in fencing agents). > >> > >> /Lindsay > >> > >> > >> On Wed, Oct 23, 2013 at 11:20 AM, David Vossel < dvos...@redhat.com > > >> wrote: > >> > >> > >> > >> ----- Original Message ----- > >>> From: "Lindsay Todd" < rltodd....@gmail.com > > >>> To: "The Pacemaker cluster resource manager" < > >>> Pacemaker@oss.clusterlabs.org > > >>> Sent: Tuesday, October 22, 2013 4:19:11 PM > >>> Subject: [Pacemaker] Asymmetric cluster, clones, and location constraints > >>> > >>> I am getting rather unexpected behavior when I combine clones, location > >>> constraints, and remote nodes in an asymmetric cluster. My cluster is > >>> configured to be asymmetric, distinguishing between vmhosts and various > >>> sorts of remote nodes. Currently I am running upstream version b6d42ed. I > >>> am > >>> simplifying my description to avoid confusion, hoping in so doing I don't > >>> miss any salient points... > >>> > >>> My physical cluster nodes, also the VM hosts, have the attribute > >>> "nodetype=vmhost". They also have Infiniband interfaces, which take some > >>> time to come up. I don't want my shared file system (which needs IB), or > >>> libvirtd (which needs the file system), to come up before IB... So I have > >>> this in my configuration: > >>> > >>> > >>> > >>> > >>> primitive p-watch-ib0 ocf:heartbeat:ethmonitor \ > >>> params \ > >>> interface="ib0" \ > >>> op monitor timeout="100s" interval="10s" > >>> clone c-watch-ib0 p-watch-ib0 \ > >>> meta interleave="true" > >>> # > >>> location loc-watch-ib-only-vmhosts c-watch-ib0 \ > >>> rule 0: nodetype eq "vmhost" > >>> > >>> Something broke between upstream versions 0a2570a and c68919f -- the > >>> c-watch-ib0 clone never starts. I've found that if I run "crm_resource > >>> --force-start -r p-watch-ib0" when IB is running, the ethmonitor-ib0 > >>> attribute is not set like it used to be. Oh well, I can set it manually. > >>> So > >>> let's. > >> > >> A re-write of the attrd component was introduced around that time period. > >> This should have been resolved at this point in the b6d42ed build. > >> > >>> We use GPFS for a shared file system, so I have an agent to start it and > >>> wait > >>> for a file system to mount. It should only run on VM hosts, and only when > >>> IB > >>> is running. So I have this: > >> > >> So the IB resource is setting some attribute that enables the fs to run? > >> Why > >> can't a ordering constraint be used here between IB and FS? > >> > >>> > >>> > >>> > >>> > >>> primitive p-fs-gpfs ocf:ccni:gpfs \ > >>> params \ > >>> fspath="/gpfs/lb/utility" \ > >>> op monitor timeout="20s" interval="30s" \ > >>> op start timeout="180s" \ > >>> op stop timeout="120s" > >>> clone c-fs-gpfs p-fs-gpfs \ > >>> meta interleave="true" > >>> location loc-fs-gpfs-needs-ib0 c-fs-gpfs \ > >>> rule -inf: not_defined "ethmonitor-ib0" or "ethmonitor-ib0" eq 0 > >>> location loc-fs-gpfs-on-vmhosts c-fs-gpfs \ > >>> rule 0: nodetype eq "vmhost" > >>> > >>> That all used to start nicely. Now even if I set the ethmonitor-ib0 > >>> attribute, it doesn't. However, I can use "crm_resource --force-start -r > >>> p-fs-gpfs" on each of my VM hosts, then issue "crm resource cleanup > >>> c-fs-gpfs", and all is well. I can use "crm status" to see something > >>> like: > >>> > >>> > >>> > >>> Last updated: Tue Oct 22 16:35:43 2013 > >>> Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01 > >>> Stack: cman > >>> Current DC: cvmh04 - partition with quorum > >>> Version: 1.1.10-19.el6.ccni-b6d42ed > >>> 8 Nodes configured > >>> 92 Resources configured > >>> > >>> > >>> Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ] > >>> > >>> fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 > >>> fence-cvmh02 (stonith:fence_ipmilan): Started cvmh01 > >>> fence-cvmh03 (stonith:fence_ipmilan): Started cvmh01 > >>> fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 > >>> Clone Set: c-fs-gpfs [p-fs-gpfs] > >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] > >>> which is what I would expect (other than I expect pacemaker to have > >>> started > >>> these for me, like it used to). > >>> > >>> Now I also have clone resources to NFS-mount another file system, and > >>> actually do a bind mount out of the GPFS file system, which behave like > >>> the > >>> GPFS resource -- they used to just work, now I need to use "crm_resource > >>> --force-start" and clean up. That finally lets me start libvirtd, using > >>> this > >>> configuration: > >>> > >>> > >>> > >>> > >>> primitive p-libvirtd lsb:libvirtd \ > >>> op monitor interval="30s" > >>> clone c-p-libvirtd p-libvirtd \ > >>> meta interleave="true" > >>> order o-libvirtd-after-storage inf: \ > >>> ( c-fs-libvirt-VM-xcm c-fs-bind-libvirt-VM-cvmh ) \ > >>> c-p-libvirtd > >>> location loc-libvirtd-on-vmhosts c-p-libvirtd \ > >>> rule 0: nodetype eq "vmhost" > >>> > >>> Of course that used to just work, but now, like the other clones, I need > >>> to > >>> force-start libvirtd on the VM hosts, and clean up. Once I do that, all > >>> my > >>> VM resources, which are not clones, just start up like they are supposed > >>> to! > >>> Several of these are configured as remote nodes, and they have services > >>> configured to run in them. But now other strange things happen: > >>> > >>> > >>> > >>> > >>> Last updated: Tue Oct 22 16:46:29 2013 > >>> Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01 > >>> Stack: cman > >>> Current DC: cvmh04 - partition with quorum > >>> Version: 1.1.10-19.el6.ccni-b6d42ed > >>> 8 Nodes configured > >>> 92 Resources configured > >>> > >>> > >>> ContainerNode slurmdb02:vm-slurmdb02: UNCLEAN (offline) > >>> Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ] > >>> Containers: [ db02:vm-db02 ldap01:vm-ldap01 ldap02:vm-ldap02 ] > >>> > >>> fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 > >>> fence-cvmh02 (stonith:fence_ipmilan): Started cvmh01 > >>> fence-cvmh03 (stonith:fence_ipmilan): Started cvmh01 > >>> fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 > >>> Clone Set: c-p-libvirtd [p-libvirtd] > >>> p-libvirtd (lsb:libvirtd): FAILED slurmdb02 > >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] > >>> Stopped: [ db02 ldap01 ldap02 ] > >>> Clone Set: c-watch-ib0 [p-watch-ib0] > >>> p-watch-ib0 (ocf::heartbeat:ethmonitor): FAILED slurmdb02 > >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] > >>> Stopped: [ db02 ldap01 ldap02 ] > >>> Clone Set: c-fs-gpfs [p-fs-gpfs] > >>> p-fs-gpfs (ocf::ccni:gpfs): FAILED slurmdb02 > >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] > >>> Stopped: [ db02 ldap01 ldap02 ] > >>> vm-compute-test (ocf::ccni:xcatVirtualDomain): FAILED [ cvmh04 slurmdb0 > >>> 2 ] > >>> vm-swbuildsl6 (ocf::ccni:xcatVirtualDomain): FAILED slurmdb02 > >>> vm-db02 (ocf::ccni:xcatVirtualDomain): Started cvmh01 > >>> vm-ldap01 (ocf::ccni:xcatVirtualDomain): Started cvmh02 > >>> vm-ldap02 (ocf::ccni:xcatVirtualDomain): Started cvmh03 > >>> p-postgres (ocf::heartbeat:pgsql): FAILED [ db02 slurmdb02 ] > >>> p-mysql (ocf::heartbeat:mysql): FAILED [ db02 slurmdb02 ] > >>> Clone Set: c-fs-share-config-data [fs-share-config-data] > >>> fs-share-config-data (ocf::heartbeat:Filesystem): FAILED slurmdb02 > >>> Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ] > >>> p-mysql-slurm (ocf::heartbeat:mysql): FAILED slurmdb02 > >>> p-slurmdbd (ocf::ccni:SlurmDBD): FAILED slurmdb02 > >>> Clone Set: c-ldapagent [s-ldapagent] > >>> s-ldapagent (ocf::ccni:WrapInitScript): FAILED slurmdb02 > >>> Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ] > >>> Clone Set: c-ldap [s-ldap] > >>> s-ldap (ocf::ccni:WrapInitScript): FAILED slurmdb02 > >>> Started: [ ldap01 ldap02 ] > >>> Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ] > >>> > >>> Now this is unexpected for a couple of reasons. I do have constraints > >>> like: > >>> > >>> > >>> > >>> > >>> location loc-vm-swbuildsl6 vm-swbuildsl6 \ > >>> rule $id="loc-vm-swbuildsl6-rule" 0: nodetype eq vmhost > >>> order o-vm-swbuildsl6 inf: c-p-libvirtd vm-swbuildsl6 > >>> > >>> And it is not the case that slurmdb02 has the vmhost attribute set; using > >>> "crm_mon -o -1 -N -A" we see: > >>> > >>> > >>> > >>> > >>> Node Attributes: > >>> * Node cvmh01: > >>> + ethmonitor-ib0 : 1 > >>> + nodetype : vmhost > >>> * Node cvmh02: > >>> + ethmonitor-ib0 : 1 > >>> + nodetype : vmhost > >>> * Node cvmh03: > >>> + ethmonitor-ib0 : 1 > >>> + nodetype : vmhost > >>> * Node cvmh04: > >>> + ethmonitor-ib0 : 1 > >>> + nodetype : vmhost > >>> * Node db02: > >>> * Node ldap01: > >>> * Node ldap02: > >>> * Node slurmdb02: > >>> > >>> The results are unexpected to me also because I (perhaps naively) > >>> wouldn't > >>> expect it to show me the new nodes on the "stopped" lines -- I kind of > >>> expected a location rule to limit where clones would even be attempted. > >>> For > >>> example, with the rule limiting c-p-libvirtd to the vmhosts, I don't > >>> really > >>> expect to be told that the clones are stopped on the remote VM nodes > >>> db02, > >>> ldap01, and ldap02 (let alone be started on slurmdb02!). > >>> > >>> Until I wrote this note, even the cloned ldap resource c-ldap needed to > >>> be > >>> started using force-start. Not sure why this time it started on its > >>> own... > >>> Perhaps this stack trace in the core dump pacemaker left on one of the VM > >>> hosts has a clue? > >>> > >>> > >>> > >>> > >>> > >>> #0 0x00007f121e9ac8e5 in raise () from /lib64/libc.so.6 > >>> #1 0x00007f121e9ae0c5 in abort () from /lib64/libc.so.6 > >>> #2 0x00007f121e9ea7f7 in __libc_message () from /lib64/libc.so.6 > >>> #3 0x00007f121e9f0126 in malloc_printerr () from /lib64/libc.so.6 > >>> #4 0x00007f121e9f05ad in malloc_consolidate () from /lib64/libc.so.6 > >>> #5 0x00007f121e9f33c5 in _int_malloc () from /lib64/libc.so.6 > >>> #6 0x00007f121e9f45e6 in calloc () from /lib64/libc.so.6 > >>> #7 0x00007f121e9e91ed in open_memstream () from /lib64/libc.so.6 > >>> #8 0x00007f121ea5ebdb in __vsyslog_chk () from /lib64/libc.so.6 > >>> #9 0x00007f121ea5f1b3 in __syslog_chk () from /lib64/libc.so.6 > >>> #10 0x00007f121e72b9fb in ?? () from /usr/lib64/libqb.so.0 > >>> #11 0x00007f121e72a6a2 in qb_log_real_va_ () from /usr/lib64/libqb.so.0 > >>> #12 0x00007f121e72a91d in qb_log_real_ () from /usr/lib64/libqb.so.0 > >>> #13 0x000000000042e994 in te_rsc_command (graph=0x20c7b40, > >>> action=0x23b0c90) > >>> at te_actions.c:412 > >> > >> This is crashing at a log message. Apparently we are trying to plug a > >> "NULL" > >> pointer into one of the format strings "%s" entries. Looking at that log > >> message, none of those values should be NULL, something is wrong here. > >> > >> > >>> #14 0x0000003a64404019 in initiate_action (graph=0x20c7b40) at > >>> graph.c:172 > >>> #15 fire_synapse (graph=0x20c7b40) at graph.c:211 > >>> #16 run_graph (graph=0x20c7b40) at graph.c:366 > >>> #17 0x000000000042f8cd in te_graph_trigger (user_data=<value optimized > >>> out>) > >>> at te_utils.c:331 > >>> #18 0x0000003a6202b283 in crm_trigger_dispatch (source=<value optimized > >>> out>, > >>> callback=<value optimized out>, userdata=<value optimized out>) > >>> at mainloop.c:105 > >>> #19 0x00000038b3c38f0e in g_main_context_dispatch () > >>> from /lib64/libglib-2.0.so.0 > >>> #20 0x00000038b3c3c938 in ?? () from /lib64/libglib-2.0.so.0 > >>> #21 0x00000038b3c3cd55 in g_main_loop_run () from /lib64/libglib-2.0.so.0 > >>> #22 0x00000000004058ee in crmd_init () at main.c:154 > >>> #23 0x0000000000405c2c in main (argc=1, argv=0x7fffdc207528) at > >>> main.c:121 > >>> > >>> Not sure how to take this further. It has been difficult to characterize > >>> what > >>> exactly is or isn't happening, and hopefully I've not left out some > >>> critical > >>> detail. Thanks. > >> > >> There is a whole lot going on here, which is making it a bit difficult to > >> know where to start. You are using attributes and rules to enable > >> resources. > >> The attrd has recently been re-written which could have caused some of the > >> problems you are seeing (especially if you ever attempted to write an > >> attribute to remote-node using a build from sometime in September) > >> > >> To make this easier to understand I'd recommend this... Get to the point > >> where you'd expect a resource to start and it isn't. Capture the cib > >> "cibadmin -q > cibsnapshot.cib". pastebin the cib and tell us which > >> resource > >> you'd expect to be starting. Then we can try and determine accurately what > >> is preventing it from starting. That will at least give us something solid > >> to work from. > >> > >> -- Vossel > >> > >>> /Lindsay > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >>> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org