Re: [Linux-HA] Apparent problem in pacemaker ordering
On Tue, Mar 6, 2012 at 3:53 AM, Florian Haas flor...@hastexo.com wrote: On Sat, Mar 3, 2012 at 8:14 PM, Florian Haas flor...@hastexo.com wrote: In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). I think I have to correct myself here, as a cursory git log --grep=interleave hasn't turned up anything recent. So I might have mis-remembered, which would mean the old interleave=false default is still unchanged. Andrew, could you clarify please? It never became the default. We do appear to be complaining too early though: diff --git a/pengine/clone.c b/pengine/clone.c index 5ff0b02..0b832fc 100644 --- a/pengine/clone.c +++ b/pengine/clone.c @@ -1035,7 +1035,7 @@ clone_rsc_colocation_rh(resource_t * rsc_lh, resource_t * rsc_rh, rsc_colocation if (constraint-rsc_lh-variant = pe_clone) { get_clone_variant_data(clone_data_lh, constraint-rsc_lh); -if (clone_data-clone_node_max != clone_data_lh-clone_node_max) { +if (clone_data_lh-interleave clone_data-clone_node_max != clone_data_lh-clone_node_max) { crm_config_err(Cannot interleave XML_CIB_TAG_INCARNATION %s and %s because they do not support the same number of Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
Hi, On Wed, Mar 07, 2012 at 07:52:16PM -0500, William Seligman wrote: On 3/5/12 11:55 AM, William Seligman wrote: On 3/3/12 3:30 PM, William Seligman wrote: On 3/3/12 2:14 PM, Florian Haas wrote: On Sat, Mar 3, 2012 at 6:55 PM, William Seligman selig...@nevis.columbia.edu wrote: On 3/3/12 12:03 PM, emmanuel segura wrote: are you sure the exportfs agent can be use it with clone active/active? a) I've been through the script. If there's some problem associated with it being cloned, I haven't seen it. (It can't handle globally-unique=true, but I didn't turn that on.) It shouldn't have a problem with being cloned. Obviously, cloning that RA _really_ makes sense only with the export that manages an NFSv4 virtual root (fsid=0). Otherwise, the export clone has to be hosted on a clustered filesystem, and you'd have to have a pNFS implementation that doesn't suck (tough to come by on Linux), and if you want that sort of replicate, parallel-access NFS you might as well use Gluster. The downside of the latter, though, is it's currently NFSv3-only, without sideband locking. I'll look this over when I have a chance. I think I can get away without a NFSv4 virtual root because I'm exporting everything to my cluster either read-only, or only one system at a time will do any writing. Now that you've warned me, I'll do some more checking. b) I had similar problems using the exportfs resource in a primary-secondary setup without clones. Why would a resource being cloned create an ordering problem? I haven't set the interleave parameter (even with the documentation I'm not sure what it does) but A before B before C seems pretty clear, even for cloned resources. As far as what interleave does. Suppose you have two clones, A and B. And they're linked with an order constraint, like this: order A_before_B inf: A B ... then if interleave is false, _all_ instances of A must be started before _any_ instance of B gets to start anywhere in the cluster. However if interleave is true, then for any node only the _local_ instance of A needs to be started before it can start the corresponding _local_ instance of B. In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). Thanks, Florian. That's a great explanation. I'll probably stick interleave=true on most of my clones just to make sure. It explains an error message I've seen in the logs: Mar 2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh: Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not support the same number of resources per node Because ClusterIPClone has globally-unique=true and clone-max=2, it's possible for both instances to be running on a single node; I've seen this a few times in my testing when cycling power on one of the nodes. Interleaving doesn't make sense in such a case. Bill, seeing as you've already pastebinned your config and crm_mon output, could you also pastebin your whole CIB as per cibadmin -Q output? Thanks. Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs resources in it; I took them out before leaving for the weekend. If it helps, I'll put them back in and try to get the cibadmin -Q output before any nodes crash. For a test, I stuck in a exportfs resource with all the ordering constraints. Here's the cibadmin -Q output from that: http://pastebin.com/nugdufJc The output of crm_mon just after doing that, showing resource failure: http://pastebin.com/cyCFGUSD Then all the resources are stopped: http://pastebin.com/D62sGSrj A few seconds later one of the nodes is fenced, but this does not bring up anything: http://pastebin.com/wzbmfVas I believe I have the solution to my stability problem. It doesn't solve the issue of ordering, but I think I have a configuration that will survive failover. Here's the problem. I had exportfs resources such as: primitive ExportUsrNevis ocf:heartbeat:exportfs \ op start interval=0 timeout=40 \ op stop interval=0 timeout=45 \ params clientspec=*.nevis.columbia.edu directory=/usr/nevis \ fsid=20 options=ro,no_root_squash,async I did detailed traces of the execution of exportfs (putting in logger commands) and found that the problem was in the backup_rmtab function in exportfs: backup_rmtab() { local rmtab_backup if [ ${OCF_RESKEY_rmtab_backup} != none ]; then rmtab_backup=${OCF_RESKEY_directory}/${OCF_RESKEY_rmtab_backup} grep :${OCF_RESKEY_directory}: /var/lib/nfs/rmtab ${rmtab_backup} fi } The problem was that the grep command was taking a long time, longer
Re: [Linux-HA] Apparent problem in pacemaker ordering
On 3/5/12 11:55 AM, William Seligman wrote: On 3/3/12 3:30 PM, William Seligman wrote: On 3/3/12 2:14 PM, Florian Haas wrote: On Sat, Mar 3, 2012 at 6:55 PM, William Seligman selig...@nevis.columbia.edu wrote: On 3/3/12 12:03 PM, emmanuel segura wrote: are you sure the exportfs agent can be use it with clone active/active? a) I've been through the script. If there's some problem associated with it being cloned, I haven't seen it. (It can't handle globally-unique=true, but I didn't turn that on.) It shouldn't have a problem with being cloned. Obviously, cloning that RA _really_ makes sense only with the export that manages an NFSv4 virtual root (fsid=0). Otherwise, the export clone has to be hosted on a clustered filesystem, and you'd have to have a pNFS implementation that doesn't suck (tough to come by on Linux), and if you want that sort of replicate, parallel-access NFS you might as well use Gluster. The downside of the latter, though, is it's currently NFSv3-only, without sideband locking. I'll look this over when I have a chance. I think I can get away without a NFSv4 virtual root because I'm exporting everything to my cluster either read-only, or only one system at a time will do any writing. Now that you've warned me, I'll do some more checking. b) I had similar problems using the exportfs resource in a primary-secondary setup without clones. Why would a resource being cloned create an ordering problem? I haven't set the interleave parameter (even with the documentation I'm not sure what it does) but A before B before C seems pretty clear, even for cloned resources. As far as what interleave does. Suppose you have two clones, A and B. And they're linked with an order constraint, like this: order A_before_B inf: A B ... then if interleave is false, _all_ instances of A must be started before _any_ instance of B gets to start anywhere in the cluster. However if interleave is true, then for any node only the _local_ instance of A needs to be started before it can start the corresponding _local_ instance of B. In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). Thanks, Florian. That's a great explanation. I'll probably stick interleave=true on most of my clones just to make sure. It explains an error message I've seen in the logs: Mar 2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh: Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not support the same number of resources per node Because ClusterIPClone has globally-unique=true and clone-max=2, it's possible for both instances to be running on a single node; I've seen this a few times in my testing when cycling power on one of the nodes. Interleaving doesn't make sense in such a case. Bill, seeing as you've already pastebinned your config and crm_mon output, could you also pastebin your whole CIB as per cibadmin -Q output? Thanks. Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs resources in it; I took them out before leaving for the weekend. If it helps, I'll put them back in and try to get the cibadmin -Q output before any nodes crash. For a test, I stuck in a exportfs resource with all the ordering constraints. Here's the cibadmin -Q output from that: http://pastebin.com/nugdufJc The output of crm_mon just after doing that, showing resource failure: http://pastebin.com/cyCFGUSD Then all the resources are stopped: http://pastebin.com/D62sGSrj A few seconds later one of the nodes is fenced, but this does not bring up anything: http://pastebin.com/wzbmfVas I believe I have the solution to my stability problem. It doesn't solve the issue of ordering, but I think I have a configuration that will survive failover. Here's the problem. I had exportfs resources such as: primitive ExportUsrNevis ocf:heartbeat:exportfs \ op start interval=0 timeout=40 \ op stop interval=0 timeout=45 \ params clientspec=*.nevis.columbia.edu directory=/usr/nevis \ fsid=20 options=ro,no_root_squash,async I did detailed traces of the execution of exportfs (putting in logger commands) and found that the problem was in the backup_rmtab function in exportfs: backup_rmtab() { local rmtab_backup if [ ${OCF_RESKEY_rmtab_backup} != none ]; then rmtab_backup=${OCF_RESKEY_directory}/${OCF_RESKEY_rmtab_backup} grep :${OCF_RESKEY_directory}: /var/lib/nfs/rmtab ${rmtab_backup} fi } The problem was that the grep command was taking a long time, longer than any timeout I'd assigned to the resource. I looked at /var/lib/nfs/rmtab, and saw it was 60GB on one of my nodes and 16GB on the other. Since backup_rmtab() is called during the stop action, the resource
Re: [Linux-HA] Apparent problem in pacemaker ordering
On Sat, Mar 3, 2012 at 8:14 PM, Florian Haas flor...@hastexo.com wrote: In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). I think I have to correct myself here, as a cursory git log --grep=interleave hasn't turned up anything recent. So I might have mis-remembered, which would mean the old interleave=false default is still unchanged. Andrew, could you clarify please? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
On 3/3/12 3:30 PM, William Seligman wrote: On 3/3/12 2:14 PM, Florian Haas wrote: On Sat, Mar 3, 2012 at 6:55 PM, William Seligman selig...@nevis.columbia.edu wrote: On 3/3/12 12:03 PM, emmanuel segura wrote: are you sure the exportfs agent can be use it with clone active/active? a) I've been through the script. If there's some problem associated with it being cloned, I haven't seen it. (It can't handle globally-unique=true, but I didn't turn that on.) It shouldn't have a problem with being cloned. Obviously, cloning that RA _really_ makes sense only with the export that manages an NFSv4 virtual root (fsid=0). Otherwise, the export clone has to be hosted on a clustered filesystem, and you'd have to have a pNFS implementation that doesn't suck (tough to come by on Linux), and if you want that sort of replicate, parallel-access NFS you might as well use Gluster. The downside of the latter, though, is it's currently NFSv3-only, without sideband locking. I'll look this over when I have a chance. I think I can get away without a NFSv4 virtual root because I'm exporting everything to my cluster either read-only, or only one system at a time will do any writing. Now that you've warned me, I'll do some more checking. b) I had similar problems using the exportfs resource in a primary-secondary setup without clones. Why would a resource being cloned create an ordering problem? I haven't set the interleave parameter (even with the documentation I'm not sure what it does) but A before B before C seems pretty clear, even for cloned resources. As far as what interleave does. Suppose you have two clones, A and B. And they're linked with an order constraint, like this: order A_before_B inf: A B ... then if interleave is false, _all_ instances of A must be started before _any_ instance of B gets to start anywhere in the cluster. However if interleave is true, then for any node only the _local_ instance of A needs to be started before it can start the corresponding _local_ instance of B. In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). Thanks, Florian. That's a great explanation. I'll probably stick interleave=true on most of my clones just to make sure. It explains an error message I've seen in the logs: Mar 2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh: Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not support the same number of resources per node Because ClusterIPClone has globally-unique=true and clone-max=2, it's possible for both instances to be running on a single node; I've seen this a few times in my testing when cycling power on one of the nodes. Interleaving doesn't make sense in such a case. Bill, seeing as you've already pastebinned your config and crm_mon output, could you also pastebin your whole CIB as per cibadmin -Q output? Thanks. Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs resources in it; I took them out before leaving for the weekend. If it helps, I'll put them back in and try to get the cibadmin -Q output before any nodes crash. For a test, I stuck in a exportfs resource with all the ordering constraints. Here's the cibadmin -Q output from that: http://pastebin.com/nugdufJc The output of crm_mon just after doing that, showing resource failure: http://pastebin.com/cyCFGUSD Then all the resources are stopped: http://pastebin.com/D62sGSrj A few seconds later one of the nodes is fenced, but this does not bring up anything: http://pastebin.com/wzbmfVas -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137| Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/ smime.p7s Description: S/MIME Cryptographic Signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
On Sat, Mar 3, 2012 at 9:30 PM, William Seligman selig...@nevis.columbia.edu wrote: In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). Thanks, Florian. That's a great explanation. I'll probably stick interleave=true on most of my clones just to make sure. It explains an error message I've seen in the logs: Mar 2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh: Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not support the same number of resources per node I've written this up in a short piece; hope this is useful: http://www.hastexo.com/resources/hints-and-kinks/interleaving-pacemaker-clones Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
are you sure the exportfs agent can be use it with clone active/active? Il giorno 03 marzo 2012 00:12, William Seligman selig...@nevis.columbia.edu ha scritto: One step forward, two steps back. I'm working on a two-node primary-primary cluster. I'm debugging problems I have with the ocf:heartbeat:exportfs resource. For some reason, pacemaker sometimes appears to ignore ordering I put on the resources. Florian Haas recommended pastebin in another thread, so let's give it a try. Here's my complete current output of crm configure show: http://pastebin.com/bbSsqyeu Here's a quick sketch: The sequence of events is supposed to be DRBD (ms) - clvmd (clone) - gfs2 (clone) - exportfs (clone). But that's not what happens. What happens is that pacemaker tries to start up the exportfs resource immediately. This fails, because what it's exporting doesn't exist until after gfs2 runs. Because the cloned resource can't run on either node, the cluster goes into a state in which one node is fenced, the other node refuses to run anything. Here's a quick snapshot I was able to take of the output of crm_mon that shows the problem: http://pastebin.com/CiZvS4Fh This shows that pacemaker is still trying to start the exportfs resources, before it has run the chain drbd-clvmd-gfs2. Just to confirm the obvious, I have the ordering constraints in the full configuration linked above (Admin is my DRBD resource): order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone This is not the only time I've observed this behavior in pacemaker. Here's a lengthy log file excerpt from the same time I took the crm_mon snapshot: http://pastebin.com/HwMUCmcX I can see that other resources, the symlink ones in particular, are being probed and started before the drbd Admin resource has a chance to be promoted. In looking at the log file, it may help to know that /mail and /var/nevis are gfs2 partitions that aren't mounted until the Gfs2 resource starts. So this isn't the first time I've seen this happen. This is just the first time I've been able to reproduce this reliably and capture a snapshot. Any ideas? -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137| Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
On 3/3/12 12:03 PM, emmanuel segura wrote: are you sure the exportfs agent can be use it with clone active/active? a) I've been through the script. If there's some problem associated with it being cloned, I haven't seen it. (It can't handle globally-unique=true, but I didn't turn that on.) b) I had similar problems using the exportfs resource in a primary-secondary setup without clones. Why would a resource being cloned create an ordering problem? I haven't set the interleave parameter (even with the documentation I'm not sure what it does) but A before B before C seems pretty clear, even for cloned resources. Il giorno 03 marzo 2012 00:12, William Seligmanselig...@nevis.columbia.edu ha scritto: One step forward, two steps back. I'm working on a two-node primary-primary cluster. I'm debugging problems I have with the ocf:heartbeat:exportfs resource. For some reason, pacemaker sometimes appears to ignore ordering I put on the resources. Florian Haas recommended pastebin in another thread, so let's give it a try. Here's my complete current output of crm configure show: http://pastebin.com/bbSsqyeu Here's a quick sketch: The sequence of events is supposed to be DRBD (ms) - clvmd (clone) - gfs2 (clone) - exportfs (clone). But that's not what happens. What happens is that pacemaker tries to start up the exportfs resource immediately. This fails, because what it's exporting doesn't exist until after gfs2 runs. Because the cloned resource can't run on either node, the cluster goes into a state in which one node is fenced, the other node refuses to run anything. Here's a quick snapshot I was able to take of the output of crm_mon that shows the problem: http://pastebin.com/CiZvS4Fh This shows that pacemaker is still trying to start the exportfs resources, before it has run the chain drbd-clvmd-gfs2. Just to confirm the obvious, I have the ordering constraints in the full configuration linked above (Admin is my DRBD resource): order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone This is not the only time I've observed this behavior in pacemaker. Here's a lengthy log file excerpt from the same time I took the crm_mon snapshot: http://pastebin.com/HwMUCmcX I can see that other resources, the symlink ones in particular, are being probed and started before the drbd Admin resource has a chance to be promoted. In looking at the log file, it may help to know that /mail and /var/nevis are gfs2 partitions that aren't mounted until the Gfs2 resource starts. So this isn't the first time I've seen this happen. This is just the first time I've been able to reproduce this reliably and capture a snapshot. Any ideas? -- Bill Seligman | mailto://selig...@nevis.columbia.edu Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/ PO Box 137| Irvington NY 10533 USA | Phone: (914) 591-2823 smime.p7s Description: S/MIME Cryptographic Signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
On Sat, Mar 3, 2012 at 6:55 PM, William Seligman selig...@nevis.columbia.edu wrote: On 3/3/12 12:03 PM, emmanuel segura wrote: are you sure the exportfs agent can be use it with clone active/active? a) I've been through the script. If there's some problem associated with it being cloned, I haven't seen it. (It can't handle globally-unique=true, but I didn't turn that on.) It shouldn't have a problem with being cloned. Obviously, cloning that RA _really_ makes sense only with the export that manages an NFSv4 virtual root (fsid=0). Otherwise, the export clone has to be hosted on a clustered filesystem, and you'd have to have a pNFS implementation that doesn't suck (tough to come by on Linux), and if you want that sort of replicate, parallel-access NFS you might as well use Gluster. The downside of the latter, though, is it's currently NFSv3-only, without sideband locking. b) I had similar problems using the exportfs resource in a primary-secondary setup without clones. Why would a resource being cloned create an ordering problem? I haven't set the interleave parameter (even with the documentation I'm not sure what it does) but A before B before C seems pretty clear, even for cloned resources. As far as what interleave does. Suppose you have two clones, A and B. And they're linked with an order constraint, like this: order A_before_B inf: A B ... then if interleave is false, _all_ instances of A must be started before _any_ instance of B gets to start anywhere in the cluster. However if interleave is true, then for any node only the _local_ instance of A needs to be started before it can start the corresponding _local_ instance of B. In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). Bill, seeing as you've already pastebinned your config and crm_mon output, could you also pastebin your whole CIB as per cibadmin -Q output? Thanks. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
On 3/3/12 2:14 PM, Florian Haas wrote: On Sat, Mar 3, 2012 at 6:55 PM, William Seligman selig...@nevis.columbia.edu wrote: On 3/3/12 12:03 PM, emmanuel segura wrote: are you sure the exportfs agent can be use it with clone active/active? a) I've been through the script. If there's some problem associated with it being cloned, I haven't seen it. (It can't handle globally-unique=true, but I didn't turn that on.) It shouldn't have a problem with being cloned. Obviously, cloning that RA _really_ makes sense only with the export that manages an NFSv4 virtual root (fsid=0). Otherwise, the export clone has to be hosted on a clustered filesystem, and you'd have to have a pNFS implementation that doesn't suck (tough to come by on Linux), and if you want that sort of replicate, parallel-access NFS you might as well use Gluster. The downside of the latter, though, is it's currently NFSv3-only, without sideband locking. I'll look this over when I have a chance. I think I can get away without a NFSv4 virtual root because I'm exporting everything to my cluster either read-only, or only one system at a time will do any writing. Now that you've warned me, I'll do some more checking. b) I had similar problems using the exportfs resource in a primary-secondary setup without clones. Why would a resource being cloned create an ordering problem? I haven't set the interleave parameter (even with the documentation I'm not sure what it does) but A before B before C seems pretty clear, even for cloned resources. As far as what interleave does. Suppose you have two clones, A and B. And they're linked with an order constraint, like this: order A_before_B inf: A B ... then if interleave is false, _all_ instances of A must be started before _any_ instance of B gets to start anywhere in the cluster. However if interleave is true, then for any node only the _local_ instance of A needs to be started before it can start the corresponding _local_ instance of B. In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x release (I don't recall which). Thanks, Florian. That's a great explanation. I'll probably stick interleave=true on most of my clones just to make sure. It explains an error message I've seen in the logs: Mar 2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh: Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not support the same number of resources per node Because ClusterIPClone has globally-unique=true and clone-max=2, it's possible for both instances to be running on a single node; I've seen this a few times in my testing when cycling power on one of the nodes. Interleaving doesn't make sense in such a case. Bill, seeing as you've already pastebinned your config and crm_mon output, could you also pastebin your whole CIB as per cibadmin -Q output? Thanks. Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs resources in it; I took them out before leaving for the weekend. If it helps, I'll put them back in and try to get the cibadmin -Q output before any nodes crash. -- Bill Seligman | mailto://selig...@nevis.columbia.edu Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/ PO Box 137| Irvington NY 10533 USA | Phone: (914) 591-2823 smime.p7s Description: S/MIME Cryptographic Signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Apparent problem in pacemaker ordering
One step forward, two steps back. I'm working on a two-node primary-primary cluster. I'm debugging problems I have with the ocf:heartbeat:exportfs resource. For some reason, pacemaker sometimes appears to ignore ordering I put on the resources. Florian Haas recommended pastebin in another thread, so let's give it a try. Here's my complete current output of crm configure show: http://pastebin.com/bbSsqyeu Here's a quick sketch: The sequence of events is supposed to be DRBD (ms) - clvmd (clone) - gfs2 (clone) - exportfs (clone). But that's not what happens. What happens is that pacemaker tries to start up the exportfs resource immediately. This fails, because what it's exporting doesn't exist until after gfs2 runs. Because the cloned resource can't run on either node, the cluster goes into a state in which one node is fenced, the other node refuses to run anything. Here's a quick snapshot I was able to take of the output of crm_mon that shows the problem: http://pastebin.com/CiZvS4Fh This shows that pacemaker is still trying to start the exportfs resources, before it has run the chain drbd-clvmd-gfs2. Just to confirm the obvious, I have the ordering constraints in the full configuration linked above (Admin is my DRBD resource): order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone This is not the only time I've observed this behavior in pacemaker. Here's a lengthy log file excerpt from the same time I took the crm_mon snapshot: http://pastebin.com/HwMUCmcX I can see that other resources, the symlink ones in particular, are being probed and started before the drbd Admin resource has a chance to be promoted. In looking at the log file, it may help to know that /mail and /var/nevis are gfs2 partitions that aren't mounted until the Gfs2 resource starts. So this isn't the first time I've seen this happen. This is just the first time I've been able to reproduce this reliably and capture a snapshot. Any ideas? -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137| Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/ smime.p7s Description: S/MIME Cryptographic Signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Apparent problem in pacemaker ordering
Darn it, forgot versions: Redhat Linux 6.2 (kernel 2.6.32) cman-3.0.12.1 corosync-1.4.1 pacemaker-1.1.6 On 3/2/12 6:12 PM, William Seligman wrote: One step forward, two steps back. I'm working on a two-node primary-primary cluster. I'm debugging problems I have with the ocf:heartbeat:exportfs resource. For some reason, pacemaker sometimes appears to ignore ordering I put on the resources. Florian Haas recommended pastebin in another thread, so let's give it a try. Here's my complete current output of crm configure show: http://pastebin.com/bbSsqyeu Here's a quick sketch: The sequence of events is supposed to be DRBD (ms) - clvmd (clone) - gfs2 (clone) - exportfs (clone). But that's not what happens. What happens is that pacemaker tries to start up the exportfs resource immediately. This fails, because what it's exporting doesn't exist until after gfs2 runs. Because the cloned resource can't run on either node, the cluster goes into a state in which one node is fenced, the other node refuses to run anything. Here's a quick snapshot I was able to take of the output of crm_mon that shows the problem: http://pastebin.com/CiZvS4Fh This shows that pacemaker is still trying to start the exportfs resources, before it has run the chain drbd-clvmd-gfs2. Just to confirm the obvious, I have the ordering constraints in the full configuration linked above (Admin is my DRBD resource): order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone This is not the only time I've observed this behavior in pacemaker. Here's a lengthy log file excerpt from the same time I took the crm_mon snapshot: http://pastebin.com/HwMUCmcX I can see that other resources, the symlink ones in particular, are being probed and started before the drbd Admin resource has a chance to be promoted. In looking at the log file, it may help to know that /mail and /var/nevis are gfs2 partitions that aren't mounted until the Gfs2 resource starts. So this isn't the first time I've seen this happen. This is just the first time I've been able to reproduce this reliably and capture a snapshot. Any ideas? -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137| Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/ smime.p7s Description: S/MIME Cryptographic Signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems