One step forward, two steps back.

I'm working on a two-node primary-primary cluster. I'm debugging problems I have
with the ocf:heartbeat:exportfs resource. For some reason, pacemaker sometimes
appears to ignore ordering I put on the resources.

Florian Haas recommended pastebin in another thread, so let's give it a try.
Here's my complete current output of "crm configure show":

<http://pastebin.com/bbSsqyeu>

Here's a quick sketch: The sequence of events is supposed to be DRBD (ms) ->
clvmd (clone) -> gfs2 (clone) -> exportfs (clone).

But that's not what happens. What happens is that pacemaker tries to start up
the exportfs resource immediately. This fails, because what it's exporting
doesn't exist until after gfs2 runs. Because the cloned resource can't run on
either node, the cluster goes into a state in which one node is fenced, the
other node refuses to run anything.

Here's a quick snapshot I was able to take of the output of crm_mon that shows
the problem:

<http://pastebin.com/CiZvS4Fh>

This shows that pacemaker is still trying to start the exportfs resources,
before it has run the chain drbd->clvmd->gfs2.

Just to confirm the obvious, I have the ordering constraints in the full
configuration linked above ("Admin" is my DRBD resource):

order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start
order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone
order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone

This is not the only time I've observed this behavior in pacemaker. Here's a
lengthy log file excerpt from the same time I took the crm_mon snapshot:

<http://pastebin.com/HwMUCmcX>

I can see that other resources, the symlink ones in particular, are being probed
and started before the drbd Admin resource has a chance to be promoted. In
looking at the log file, it may help to know that /mail and /var/nevis are gfs2
partitions that aren't mounted until the Gfs2 resource starts.

So this isn't the first time I've seen this happen. This is just the first time
I've been able to reproduce this reliably and capture a snapshot.

Any ideas?
-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to