On 25 Oct 2013, at 9:40 am, David Vossel <dvos...@redhat.com> wrote:
> > > > > ----- Original Message ----- >> From: "Lindsay Todd" <rltodd....@gmail.com> >> To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> >> Sent: Wednesday, October 23, 2013 2:38:17 PM >> Subject: Re: [Pacemaker] Asymmetric cluster, clones, and location constraints >> >> David, >> >> The Infiniband network takes a nondeterministic amount of time to actually >> finish initializing, so we use ethmonitor to watch it; the OS is supposed to >> bring it up at boot time, but it moves on through the boot sequence without >> actually waiting for it. So in self defense we watch it with pacemaker. I >> guess I could restructure this to use a resource that brings up IB (with a >> really long time out) and use ordering to wait for that complete, but it >> seems that ethmonitor would be more adaptive to short-term IB network >> issues. Since ethmonitor works by setting an attribute (the RA running means >> it is watching the network, not that the network is up), I've used location >> constraints instead of ordering constraints. >> >> So I have completely restarted my cluster. Right now the physical nodes see >> each other, and the fencing agents are running. The first thing that should >> start are the ethmonitor resource agents on the VM hosts (the c-watch-ib0 >> clones of the p-watch-ib0 primitive). They are not starting (like they used >> to). > > I see. Your cib generates an invalid transition. I'll try and look into it > in more detail soon to understand the cause. According to git bisect, the winner is: 15a86e501a57b50fdb3b8ce0ed432b183c343c74 is the first bad commit commit 15a86e501a57b50fdb3b8ce0ed432b183c343c74 Author: David Vossel <dvos...@redhat.com> Date: Mon Sep 23 18:55:21 2013 -0500 High: pengine: Probe container nodes I'll take a look in the morning unless David beats me to it :-) > > One completely unrelated thought I had while looking at your config involves > your fencing agents. You shouldn't have to use location constraints at on the > fencing agents. I believe stonith is smart enough now to execute the agent on > a node that isn't the target regardless of where the policy engine puts it. > > -- Vossel > >> The cib snapshot can be seen in http://pastebin.com/TccTHQPS (some >> slight editing to hide passwords in fencing agents). >> >> /Lindsay >> >> >> On Wed, Oct 23, 2013 at 11:20 AM, David Vossel < dvos...@redhat.com > wrote: >> >> >> >> ----- Original Message ----- >>> From: "Lindsay Todd" < rltodd....@gmail.com > >>> To: "The Pacemaker cluster resource manager" < >>> Pacemaker@oss.clusterlabs.org > >>> Sent: Tuesday, October 22, 2013 4:19:11 PM >>> Subject: [Pacemaker] Asymmetric cluster, clones, and location constraints >>> >>> I am getting rather unexpected behavior when I combine clones, location >>> constraints, and remote nodes in an asymmetric cluster. My cluster is >>> configured to be asymmetric, distinguishing between vmhosts and various >>> sorts of remote nodes. Currently I am running upstream version b6d42ed. I >>> am >>> simplifying my description to avoid confusion, hoping in so doing I don't >>> miss any salient points... >>> >>> My physical cluster nodes, also the VM hosts, have the attribute >>> "nodetype=vmhost". They also have Infiniband interfaces, which take some >>> time to come up. I don't want my shared file system (which needs IB), or >>> libvirtd (which needs the file system), to come up before IB... So I have >>> this in my configuration: >>> >>> >>> >>> >>> primitive p-watch-ib0 ocf:heartbeat:ethmonitor \ >>> params \ >>> interface="ib0" \ >>> op monitor timeout="100s" interval="10s" >>> clone c-watch-ib0 p-watch-ib0 \ >>> meta interleave="true" >>> # >>> location loc-watch-ib-only-vmhosts c-watch-ib0 \ >>> rule 0: nodetype eq "vmhost" >>> >>> Something broke between upstream versions 0a2570a and c68919f -- the >>> c-watch-ib0 clone never starts. I've found that if I run "crm_resource >>> --force-start -r p-watch-ib0" when IB is running, the ethmonitor-ib0 >>> attribute is not set like it used to be. Oh well, I can set it manually. So >>> let's. >> >> A re-write of the attrd component was introduced around that time period. >> This should have been resolved at this point in the b6d42ed build. >> >>> We use GPFS for a shared file system, so I have an agent to start it and >>> wait >>> for a file system to mount. It should only run on VM hosts, and only when >>> IB >>> is running. So I have this: >> >> So the IB resource is setting some attribute that enables the fs to run? Why >> can't a ordering constraint be used here between IB and FS? >> >>> >>> >>> >>> >>> primitive p-fs-gpfs ocf:ccni:gpfs \ >>> params \ >>> fspath="/gpfs/lb/utility" \ >>> op monitor timeout="20s" interval="30s" \ >>> op start timeout="180s" \ >>> op stop timeout="120s" >>> clone c-fs-gpfs p-fs-gpfs \ >>> meta interleave="true" >>> location loc-fs-gpfs-needs-ib0 c-fs-gpfs \ >>> rule -inf: not_defined "ethmonitor-ib0" or "ethmonitor-ib0" eq 0 >>> location loc-fs-gpfs-on-vmhosts c-fs-gpfs \ >>> rule 0: nodetype eq "vmhost" >>> >>> That all used to start nicely. Now even if I set the ethmonitor-ib0 >>> attribute, it doesn't. However, I can use "crm_resource --force-start -r >>> p-fs-gpfs" on each of my VM hosts, then issue "crm resource cleanup >>> c-fs-gpfs", and all is well. I can use "crm status" to see something like: >>> >>> >>> >>> Last updated: Tue Oct 22 16:35:43 2013 >>> Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01 >>> Stack: cman >>> Current DC: cvmh04 - partition with quorum >>> Version: 1.1.10-19.el6.ccni-b6d42ed >>> 8 Nodes configured >>> 92 Resources configured >>> >>> >>> Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ] >>> >>> fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 >>> fence-cvmh02 (stonith:fence_ipmilan): Started cvmh01 >>> fence-cvmh03 (stonith:fence_ipmilan): Started cvmh01 >>> fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 >>> Clone Set: c-fs-gpfs [p-fs-gpfs] >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] >>> which is what I would expect (other than I expect pacemaker to have started >>> these for me, like it used to). >>> >>> Now I also have clone resources to NFS-mount another file system, and >>> actually do a bind mount out of the GPFS file system, which behave like the >>> GPFS resource -- they used to just work, now I need to use "crm_resource >>> --force-start" and clean up. That finally lets me start libvirtd, using >>> this >>> configuration: >>> >>> >>> >>> >>> primitive p-libvirtd lsb:libvirtd \ >>> op monitor interval="30s" >>> clone c-p-libvirtd p-libvirtd \ >>> meta interleave="true" >>> order o-libvirtd-after-storage inf: \ >>> ( c-fs-libvirt-VM-xcm c-fs-bind-libvirt-VM-cvmh ) \ >>> c-p-libvirtd >>> location loc-libvirtd-on-vmhosts c-p-libvirtd \ >>> rule 0: nodetype eq "vmhost" >>> >>> Of course that used to just work, but now, like the other clones, I need to >>> force-start libvirtd on the VM hosts, and clean up. Once I do that, all my >>> VM resources, which are not clones, just start up like they are supposed >>> to! >>> Several of these are configured as remote nodes, and they have services >>> configured to run in them. But now other strange things happen: >>> >>> >>> >>> >>> Last updated: Tue Oct 22 16:46:29 2013 >>> Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01 >>> Stack: cman >>> Current DC: cvmh04 - partition with quorum >>> Version: 1.1.10-19.el6.ccni-b6d42ed >>> 8 Nodes configured >>> 92 Resources configured >>> >>> >>> ContainerNode slurmdb02:vm-slurmdb02: UNCLEAN (offline) >>> Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ] >>> Containers: [ db02:vm-db02 ldap01:vm-ldap01 ldap02:vm-ldap02 ] >>> >>> fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 >>> fence-cvmh02 (stonith:fence_ipmilan): Started cvmh01 >>> fence-cvmh03 (stonith:fence_ipmilan): Started cvmh01 >>> fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 >>> Clone Set: c-p-libvirtd [p-libvirtd] >>> p-libvirtd (lsb:libvirtd): FAILED slurmdb02 >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] >>> Stopped: [ db02 ldap01 ldap02 ] >>> Clone Set: c-watch-ib0 [p-watch-ib0] >>> p-watch-ib0 (ocf::heartbeat:ethmonitor): FAILED slurmdb02 >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] >>> Stopped: [ db02 ldap01 ldap02 ] >>> Clone Set: c-fs-gpfs [p-fs-gpfs] >>> p-fs-gpfs (ocf::ccni:gpfs): FAILED slurmdb02 >>> Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] >>> Stopped: [ db02 ldap01 ldap02 ] >>> vm-compute-test (ocf::ccni:xcatVirtualDomain): FAILED [ cvmh04 slurmdb0 >>> 2 ] >>> vm-swbuildsl6 (ocf::ccni:xcatVirtualDomain): FAILED slurmdb02 >>> vm-db02 (ocf::ccni:xcatVirtualDomain): Started cvmh01 >>> vm-ldap01 (ocf::ccni:xcatVirtualDomain): Started cvmh02 >>> vm-ldap02 (ocf::ccni:xcatVirtualDomain): Started cvmh03 >>> p-postgres (ocf::heartbeat:pgsql): FAILED [ db02 slurmdb02 ] >>> p-mysql (ocf::heartbeat:mysql): FAILED [ db02 slurmdb02 ] >>> Clone Set: c-fs-share-config-data [fs-share-config-data] >>> fs-share-config-data (ocf::heartbeat:Filesystem): FAILED slurmdb02 >>> Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ] >>> p-mysql-slurm (ocf::heartbeat:mysql): FAILED slurmdb02 >>> p-slurmdbd (ocf::ccni:SlurmDBD): FAILED slurmdb02 >>> Clone Set: c-ldapagent [s-ldapagent] >>> s-ldapagent (ocf::ccni:WrapInitScript): FAILED slurmdb02 >>> Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ] >>> Clone Set: c-ldap [s-ldap] >>> s-ldap (ocf::ccni:WrapInitScript): FAILED slurmdb02 >>> Started: [ ldap01 ldap02 ] >>> Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ] >>> >>> Now this is unexpected for a couple of reasons. I do have constraints like: >>> >>> >>> >>> >>> location loc-vm-swbuildsl6 vm-swbuildsl6 \ >>> rule $id="loc-vm-swbuildsl6-rule" 0: nodetype eq vmhost >>> order o-vm-swbuildsl6 inf: c-p-libvirtd vm-swbuildsl6 >>> >>> And it is not the case that slurmdb02 has the vmhost attribute set; using >>> "crm_mon -o -1 -N -A" we see: >>> >>> >>> >>> >>> Node Attributes: >>> * Node cvmh01: >>> + ethmonitor-ib0 : 1 >>> + nodetype : vmhost >>> * Node cvmh02: >>> + ethmonitor-ib0 : 1 >>> + nodetype : vmhost >>> * Node cvmh03: >>> + ethmonitor-ib0 : 1 >>> + nodetype : vmhost >>> * Node cvmh04: >>> + ethmonitor-ib0 : 1 >>> + nodetype : vmhost >>> * Node db02: >>> * Node ldap01: >>> * Node ldap02: >>> * Node slurmdb02: >>> >>> The results are unexpected to me also because I (perhaps naively) wouldn't >>> expect it to show me the new nodes on the "stopped" lines -- I kind of >>> expected a location rule to limit where clones would even be attempted. For >>> example, with the rule limiting c-p-libvirtd to the vmhosts, I don't really >>> expect to be told that the clones are stopped on the remote VM nodes db02, >>> ldap01, and ldap02 (let alone be started on slurmdb02!). >>> >>> Until I wrote this note, even the cloned ldap resource c-ldap needed to be >>> started using force-start. Not sure why this time it started on its own... >>> Perhaps this stack trace in the core dump pacemaker left on one of the VM >>> hosts has a clue? >>> >>> >>> >>> >>> >>> #0 0x00007f121e9ac8e5 in raise () from /lib64/libc.so.6 >>> #1 0x00007f121e9ae0c5 in abort () from /lib64/libc.so.6 >>> #2 0x00007f121e9ea7f7 in __libc_message () from /lib64/libc.so.6 >>> #3 0x00007f121e9f0126 in malloc_printerr () from /lib64/libc.so.6 >>> #4 0x00007f121e9f05ad in malloc_consolidate () from /lib64/libc.so.6 >>> #5 0x00007f121e9f33c5 in _int_malloc () from /lib64/libc.so.6 >>> #6 0x00007f121e9f45e6 in calloc () from /lib64/libc.so.6 >>> #7 0x00007f121e9e91ed in open_memstream () from /lib64/libc.so.6 >>> #8 0x00007f121ea5ebdb in __vsyslog_chk () from /lib64/libc.so.6 >>> #9 0x00007f121ea5f1b3 in __syslog_chk () from /lib64/libc.so.6 >>> #10 0x00007f121e72b9fb in ?? () from /usr/lib64/libqb.so.0 >>> #11 0x00007f121e72a6a2 in qb_log_real_va_ () from /usr/lib64/libqb.so.0 >>> #12 0x00007f121e72a91d in qb_log_real_ () from /usr/lib64/libqb.so.0 >>> #13 0x000000000042e994 in te_rsc_command (graph=0x20c7b40, >>> action=0x23b0c90) >>> at te_actions.c:412 >> >> This is crashing at a log message. Apparently we are trying to plug a "NULL" >> pointer into one of the format strings "%s" entries. Looking at that log >> message, none of those values should be NULL, something is wrong here. >> >> >>> #14 0x0000003a64404019 in initiate_action (graph=0x20c7b40) at graph.c:172 >>> #15 fire_synapse (graph=0x20c7b40) at graph.c:211 >>> #16 run_graph (graph=0x20c7b40) at graph.c:366 >>> #17 0x000000000042f8cd in te_graph_trigger (user_data=<value optimized >>> out>) >>> at te_utils.c:331 >>> #18 0x0000003a6202b283 in crm_trigger_dispatch (source=<value optimized >>> out>, >>> callback=<value optimized out>, userdata=<value optimized out>) >>> at mainloop.c:105 >>> #19 0x00000038b3c38f0e in g_main_context_dispatch () >>> from /lib64/libglib-2.0.so.0 >>> #20 0x00000038b3c3c938 in ?? () from /lib64/libglib-2.0.so.0 >>> #21 0x00000038b3c3cd55 in g_main_loop_run () from /lib64/libglib-2.0.so.0 >>> #22 0x00000000004058ee in crmd_init () at main.c:154 >>> #23 0x0000000000405c2c in main (argc=1, argv=0x7fffdc207528) at main.c:121 >>> >>> Not sure how to take this further. It has been difficult to characterize >>> what >>> exactly is or isn't happening, and hopefully I've not left out some >>> critical >>> detail. Thanks. >> >> There is a whole lot going on here, which is making it a bit difficult to >> know where to start. You are using attributes and rules to enable resources. >> The attrd has recently been re-written which could have caused some of the >> problems you are seeing (especially if you ever attempted to write an >> attribute to remote-node using a build from sometime in September) >> >> To make this easier to understand I'd recommend this... Get to the point >> where you'd expect a resource to start and it isn't. Capture the cib >> "cibadmin -q > cibsnapshot.cib". pastebin the cib and tell us which resource >> you'd expect to be starting. Then we can try and determine accurately what >> is preventing it from starting. That will at least give us something solid >> to work from. >> >> -- Vossel >> >>> /Lindsay >>> >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org