Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-19 Thread Andrew Beekhof
On Tue, Mar 6, 2012 at 3:53 AM, Florian Haas flor...@hastexo.com wrote:
 On Sat, Mar 3, 2012 at 8:14 PM, Florian Haas flor...@hastexo.com wrote:
 In other words, interleave=true is actually the reasonable thing to
 set on all clone instances by default, and I believe the pengine
 actually does use a default of interleave=true on defined clone sets
 since some 1.1.x release (I don't recall which).

 I think I have to correct myself here, as a cursory git log
 --grep=interleave hasn't turned up anything recent. So I might have
 mis-remembered, which would mean the old interleave=false default is
 still unchanged. Andrew, could you clarify please?

It never became the default.

We do appear to be complaining too early though:

diff --git a/pengine/clone.c b/pengine/clone.c
index 5ff0b02..0b832fc 100644
--- a/pengine/clone.c
+++ b/pengine/clone.c
@@ -1035,7 +1035,7 @@ clone_rsc_colocation_rh(resource_t * rsc_lh,
resource_t * rsc_rh, rsc_colocation
 if (constraint-rsc_lh-variant = pe_clone) {

 get_clone_variant_data(clone_data_lh, constraint-rsc_lh);
-if (clone_data-clone_node_max != clone_data_lh-clone_node_max) {
+if (clone_data_lh-interleave  clone_data-clone_node_max
!= clone_data_lh-clone_node_max) {
 crm_config_err(Cannot interleave  XML_CIB_TAG_INCARNATION
 %s and %s because
 they do not support the same number of




 Cheers,
 Florian

 --
 Need help with High Availability?
 http://www.hastexo.com/now
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-08 Thread Dejan Muhamedagic
Hi,

On Wed, Mar 07, 2012 at 07:52:16PM -0500, William Seligman wrote:
 On 3/5/12 11:55 AM, William Seligman wrote:
  On 3/3/12 3:30 PM, William Seligman wrote:
  On 3/3/12 2:14 PM, Florian Haas wrote:
  On Sat, Mar 3, 2012 at 6:55 PM, William Seligman
  selig...@nevis.columbia.edu  wrote:
  On 3/3/12 12:03 PM, emmanuel segura wrote:
 
  are you sure the exportfs agent can be use it with clone active/active?
 
  a) I've been through the script. If there's some problem associated with 
  it
  being cloned, I haven't seen it. (It can't handle globally-unique=true,
  but I didn't turn that on.)
 
  It shouldn't have a problem with being cloned. Obviously, cloning that
  RA _really_ makes sense only with the export that manages an NFSv4
  virtual root (fsid=0). Otherwise, the export clone has to be hosted on
  a clustered filesystem, and you'd have to have a pNFS implementation
  that doesn't suck (tough to come by on Linux), and if you want that
  sort of replicate, parallel-access NFS you might as well use Gluster.
  The downside of the latter, though, is it's currently NFSv3-only,
  without sideband locking.
 
  I'll look this over when I have a chance. I think I can get away without a 
  NFSv4
  virtual root because I'm exporting everything to my cluster either 
  read-only, or
  only one system at a time will do any writing. Now that you've warned me, 
  I'll
  do some more checking.
 
  b) I had similar problems using the exportfs resource in a 
  primary-secondary
  setup without clones.
 
  Why would a resource being cloned create an ordering problem? I haven't 
  set
  the interleave parameter (even with the documentation I'm not sure what 
  it
  does) but A before B before C seems pretty clear, even for cloned 
  resources.
 
  As far as what interleave does. Suppose you have two clones, A and B.
  And they're linked with an order constraint, like this:
 
  order A_before_B inf: A B
 
  ... then if interleave is false, _all_ instances of A must be started
  before _any_ instance of B gets to start anywhere in the cluster.
  However if interleave is true, then for any node only the _local_
  instance of A needs to be started before it can start the
  corresponding _local_ instance of B.
 
  In other words, interleave=true is actually the reasonable thing to
  set on all clone instances by default, and I believe the pengine
  actually does use a default of interleave=true on defined clone sets
  since some 1.1.x release (I don't recall which).
 
  Thanks, Florian. That's a great explanation. I'll probably stick
  interleave=true on most of my clones just to make sure.
 
  It explains an error message I've seen in the logs:
 
  Mar  2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh:
  Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not 
  support
  the same number of resources per node
 
  Because ClusterIPClone has globally-unique=true and clone-max=2, it's 
  possible
  for both instances to be running on a single node; I've seen this a few 
  times in
  my testing when cycling power on one of the nodes. Interleaving doesn't 
  make
  sense in such a case.
 
  Bill, seeing as you've already pastebinned your config and crm_mon
  output, could you also pastebin your whole CIB as per cibadmin -Q
  output? Thanks.
 
  Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs 
  resources in
  it; I took them out before leaving for the weekend. If it helps, I'll put 
  them
  back in and try to get the cibadmin -Q output before any nodes crash.
 
  
  For a test, I stuck in a exportfs resource with all the ordering 
  constraints.
  Here's the cibadmin -Q output from that:
  
  http://pastebin.com/nugdufJc
  
  The output of crm_mon just after doing that, showing resource failure:
  
  http://pastebin.com/cyCFGUSD
  
  Then all the resources are stopped:
  
  http://pastebin.com/D62sGSrj
  
  A few seconds later one of the nodes is fenced, but this does not bring up
  anything:
  
  http://pastebin.com/wzbmfVas
 
 I believe I have the solution to my stability problem. It doesn't solve the
 issue of ordering, but I think I have a configuration that will survive 
 failover.
 
 Here's the problem. I had exportfs resources such as:
 
 primitive ExportUsrNevis ocf:heartbeat:exportfs \
 op start interval=0 timeout=40 \
 op stop interval=0 timeout=45 \
 params clientspec=*.nevis.columbia.edu directory=/usr/nevis \
 fsid=20 options=ro,no_root_squash,async
 
 I did detailed traces of the execution of exportfs (putting in logger 
 commands)
 and found that the problem was in the backup_rmtab function in exportfs:
 
 backup_rmtab() {
 local rmtab_backup
 if [ ${OCF_RESKEY_rmtab_backup} != none ]; then
   rmtab_backup=${OCF_RESKEY_directory}/${OCF_RESKEY_rmtab_backup}
   grep :${OCF_RESKEY_directory}: /var/lib/nfs/rmtab  ${rmtab_backup}
 fi
 }
 
 The problem was that the grep command was taking a long time, longer 

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-07 Thread William Seligman
On 3/5/12 11:55 AM, William Seligman wrote:
 On 3/3/12 3:30 PM, William Seligman wrote:
 On 3/3/12 2:14 PM, Florian Haas wrote:
 On Sat, Mar 3, 2012 at 6:55 PM, William Seligman
 selig...@nevis.columbia.edu  wrote:
 On 3/3/12 12:03 PM, emmanuel segura wrote:

 are you sure the exportfs agent can be use it with clone active/active?

 a) I've been through the script. If there's some problem associated with it
 being cloned, I haven't seen it. (It can't handle globally-unique=true,
 but I didn't turn that on.)

 It shouldn't have a problem with being cloned. Obviously, cloning that
 RA _really_ makes sense only with the export that manages an NFSv4
 virtual root (fsid=0). Otherwise, the export clone has to be hosted on
 a clustered filesystem, and you'd have to have a pNFS implementation
 that doesn't suck (tough to come by on Linux), and if you want that
 sort of replicate, parallel-access NFS you might as well use Gluster.
 The downside of the latter, though, is it's currently NFSv3-only,
 without sideband locking.

 I'll look this over when I have a chance. I think I can get away without a 
 NFSv4
 virtual root because I'm exporting everything to my cluster either 
 read-only, or
 only one system at a time will do any writing. Now that you've warned me, 
 I'll
 do some more checking.

 b) I had similar problems using the exportfs resource in a 
 primary-secondary
 setup without clones.

 Why would a resource being cloned create an ordering problem? I haven't set
 the interleave parameter (even with the documentation I'm not sure what it
 does) but A before B before C seems pretty clear, even for cloned 
 resources.

 As far as what interleave does. Suppose you have two clones, A and B.
 And they're linked with an order constraint, like this:

 order A_before_B inf: A B

 ... then if interleave is false, _all_ instances of A must be started
 before _any_ instance of B gets to start anywhere in the cluster.
 However if interleave is true, then for any node only the _local_
 instance of A needs to be started before it can start the
 corresponding _local_ instance of B.

 In other words, interleave=true is actually the reasonable thing to
 set on all clone instances by default, and I believe the pengine
 actually does use a default of interleave=true on defined clone sets
 since some 1.1.x release (I don't recall which).

 Thanks, Florian. That's a great explanation. I'll probably stick
 interleave=true on most of my clones just to make sure.

 It explains an error message I've seen in the logs:

 Mar  2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh:
 Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not 
 support
 the same number of resources per node

 Because ClusterIPClone has globally-unique=true and clone-max=2, it's 
 possible
 for both instances to be running on a single node; I've seen this a few 
 times in
 my testing when cycling power on one of the nodes. Interleaving doesn't make
 sense in such a case.

 Bill, seeing as you've already pastebinned your config and crm_mon
 output, could you also pastebin your whole CIB as per cibadmin -Q
 output? Thanks.

 Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs resources 
 in
 it; I took them out before leaving for the weekend. If it helps, I'll put 
 them
 back in and try to get the cibadmin -Q output before any nodes crash.

 
 For a test, I stuck in a exportfs resource with all the ordering constraints.
 Here's the cibadmin -Q output from that:
 
 http://pastebin.com/nugdufJc
 
 The output of crm_mon just after doing that, showing resource failure:
 
 http://pastebin.com/cyCFGUSD
 
 Then all the resources are stopped:
 
 http://pastebin.com/D62sGSrj
 
 A few seconds later one of the nodes is fenced, but this does not bring up
 anything:
 
 http://pastebin.com/wzbmfVas

I believe I have the solution to my stability problem. It doesn't solve the
issue of ordering, but I think I have a configuration that will survive 
failover.

Here's the problem. I had exportfs resources such as:

primitive ExportUsrNevis ocf:heartbeat:exportfs \
op start interval=0 timeout=40 \
op stop interval=0 timeout=45 \
params clientspec=*.nevis.columbia.edu directory=/usr/nevis \
fsid=20 options=ro,no_root_squash,async

I did detailed traces of the execution of exportfs (putting in logger commands)
and found that the problem was in the backup_rmtab function in exportfs:

backup_rmtab() {
local rmtab_backup
if [ ${OCF_RESKEY_rmtab_backup} != none ]; then
rmtab_backup=${OCF_RESKEY_directory}/${OCF_RESKEY_rmtab_backup}
grep :${OCF_RESKEY_directory}: /var/lib/nfs/rmtab  ${rmtab_backup}
fi
}

The problem was that the grep command was taking a long time, longer than any
timeout I'd assigned to the resource. I looked at /var/lib/nfs/rmtab, and saw it
was 60GB on one of my nodes and 16GB on the other. Since backup_rmtab() is
called during the stop action, the resource 

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-05 Thread Florian Haas
On Sat, Mar 3, 2012 at 8:14 PM, Florian Haas flor...@hastexo.com wrote:
 In other words, interleave=true is actually the reasonable thing to
 set on all clone instances by default, and I believe the pengine
 actually does use a default of interleave=true on defined clone sets
 since some 1.1.x release (I don't recall which).

I think I have to correct myself here, as a cursory git log
--grep=interleave hasn't turned up anything recent. So I might have
mis-remembered, which would mean the old interleave=false default is
still unchanged. Andrew, could you clarify please?

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-05 Thread William Seligman
On 3/3/12 3:30 PM, William Seligman wrote:
 On 3/3/12 2:14 PM, Florian Haas wrote:
 On Sat, Mar 3, 2012 at 6:55 PM, William Seligman
 selig...@nevis.columbia.edu  wrote:
 On 3/3/12 12:03 PM, emmanuel segura wrote:

 are you sure the exportfs agent can be use it with clone active/active?

 a) I've been through the script. If there's some problem associated with it
 being cloned, I haven't seen it. (It can't handle globally-unique=true,
 but I didn't turn that on.)

 It shouldn't have a problem with being cloned. Obviously, cloning that
 RA _really_ makes sense only with the export that manages an NFSv4
 virtual root (fsid=0). Otherwise, the export clone has to be hosted on
 a clustered filesystem, and you'd have to have a pNFS implementation
 that doesn't suck (tough to come by on Linux), and if you want that
 sort of replicate, parallel-access NFS you might as well use Gluster.
 The downside of the latter, though, is it's currently NFSv3-only,
 without sideband locking.
 
 I'll look this over when I have a chance. I think I can get away without a 
 NFSv4
 virtual root because I'm exporting everything to my cluster either read-only, 
 or
 only one system at a time will do any writing. Now that you've warned me, I'll
 do some more checking.
 
 b) I had similar problems using the exportfs resource in a primary-secondary
 setup without clones.

 Why would a resource being cloned create an ordering problem? I haven't set
 the interleave parameter (even with the documentation I'm not sure what it
 does) but A before B before C seems pretty clear, even for cloned resources.

 As far as what interleave does. Suppose you have two clones, A and B.
 And they're linked with an order constraint, like this:

 order A_before_B inf: A B

 ... then if interleave is false, _all_ instances of A must be started
 before _any_ instance of B gets to start anywhere in the cluster.
 However if interleave is true, then for any node only the _local_
 instance of A needs to be started before it can start the
 corresponding _local_ instance of B.

 In other words, interleave=true is actually the reasonable thing to
 set on all clone instances by default, and I believe the pengine
 actually does use a default of interleave=true on defined clone sets
 since some 1.1.x release (I don't recall which).
 
 Thanks, Florian. That's a great explanation. I'll probably stick
 interleave=true on most of my clones just to make sure.
 
 It explains an error message I've seen in the logs:
 
 Mar  2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh:
 Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not 
 support
 the same number of resources per node
 
 Because ClusterIPClone has globally-unique=true and clone-max=2, it's possible
 for both instances to be running on a single node; I've seen this a few times 
 in
 my testing when cycling power on one of the nodes. Interleaving doesn't make
 sense in such a case.
 
 Bill, seeing as you've already pastebinned your config and crm_mon
 output, could you also pastebin your whole CIB as per cibadmin -Q
 output? Thanks.
 
 Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs resources 
 in
 it; I took them out before leaving for the weekend. If it helps, I'll put them
 back in and try to get the cibadmin -Q output before any nodes crash.
 

For a test, I stuck in a exportfs resource with all the ordering constraints.
Here's the cibadmin -Q output from that:

http://pastebin.com/nugdufJc

The output of crm_mon just after doing that, showing resource failure:

http://pastebin.com/cyCFGUSD

Then all the resources are stopped:

http://pastebin.com/D62sGSrj

A few seconds later one of the nodes is fenced, but this does not bring up
anything:

http://pastebin.com/wzbmfVas
-- 
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
PO Box 137|
Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-05 Thread Florian Haas
On Sat, Mar 3, 2012 at 9:30 PM, William Seligman
selig...@nevis.columbia.edu wrote:
 In other words, interleave=true is actually the reasonable thing to
 set on all clone instances by default, and I believe the pengine
 actually does use a default of interleave=true on defined clone sets
 since some 1.1.x release (I don't recall which).


 Thanks, Florian. That's a great explanation. I'll probably stick
 interleave=true on most of my clones just to make sure.

 It explains an error message I've seen in the logs:

 Mar  2 18:15:19 hypatia-tb pengine: [4414]: ERROR: clone_rsc_colocation_rh:
 Cannot interleave clone ClusterIPClone and Gfs2Clone because they do not
 support the same number of resources per node

I've written this up in a short piece; hope this is useful:

http://www.hastexo.com/resources/hints-and-kinks/interleaving-pacemaker-clones

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-03 Thread emmanuel segura
are you sure the exportfs agent can be use it with clone active/active?

Il giorno 03 marzo 2012 00:12, William Seligman selig...@nevis.columbia.edu
 ha scritto:

 One step forward, two steps back.

 I'm working on a two-node primary-primary cluster. I'm debugging problems
 I have
 with the ocf:heartbeat:exportfs resource. For some reason, pacemaker
 sometimes
 appears to ignore ordering I put on the resources.

 Florian Haas recommended pastebin in another thread, so let's give it a
 try.
 Here's my complete current output of crm configure show:

 http://pastebin.com/bbSsqyeu

 Here's a quick sketch: The sequence of events is supposed to be DRBD (ms)
 -
 clvmd (clone) - gfs2 (clone) - exportfs (clone).

 But that's not what happens. What happens is that pacemaker tries to start
 up
 the exportfs resource immediately. This fails, because what it's exporting
 doesn't exist until after gfs2 runs. Because the cloned resource can't run
 on
 either node, the cluster goes into a state in which one node is fenced, the
 other node refuses to run anything.

 Here's a quick snapshot I was able to take of the output of crm_mon that
 shows
 the problem:

 http://pastebin.com/CiZvS4Fh

 This shows that pacemaker is still trying to start the exportfs resources,
 before it has run the chain drbd-clvmd-gfs2.

 Just to confirm the obvious, I have the ordering constraints in the full
 configuration linked above (Admin is my DRBD resource):

 order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start
 order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone
 order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone

 This is not the only time I've observed this behavior in pacemaker. Here's
 a
 lengthy log file excerpt from the same time I took the crm_mon snapshot:

 http://pastebin.com/HwMUCmcX

 I can see that other resources, the symlink ones in particular, are being
 probed
 and started before the drbd Admin resource has a chance to be promoted. In
 looking at the log file, it may help to know that /mail and /var/nevis are
 gfs2
 partitions that aren't mounted until the Gfs2 resource starts.

 So this isn't the first time I've seen this happen. This is just the first
 time
 I've been able to reproduce this reliably and capture a snapshot.

 Any ideas?
 --
 Bill Seligman | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137|
 Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-03 Thread William Seligman

On 3/3/12 12:03 PM, emmanuel segura wrote:

are you sure the exportfs agent can be use it with clone active/active?


a) I've been through the script. If there's some problem associated with 
it being cloned, I haven't seen it. (It can't handle 
globally-unique=true, but I didn't turn that on.)


b) I had similar problems using the exportfs resource in a 
primary-secondary setup without clones.


Why would a resource being cloned create an ordering problem? I haven't 
set the interleave parameter (even with the documentation I'm not sure 
what it does) but A before B before C seems pretty clear, even for 
cloned resources.



Il giorno 03 marzo 2012 00:12, William Seligmanselig...@nevis.columbia.edu

ha scritto:



One step forward, two steps back.

I'm working on a two-node primary-primary cluster. I'm debugging
problems I have with the ocf:heartbeat:exportfs resource. For some
reason, pacemaker sometimes appears to ignore ordering I put on the
resources.

Florian Haas recommended pastebin in another thread, so let's give
it a try. Here's my complete current output of crm configure
show:

http://pastebin.com/bbSsqyeu

Here's a quick sketch: The sequence of events is supposed to be
DRBD (ms) - clvmd (clone) -  gfs2 (clone) -  exportfs (clone).

But that's not what happens. What happens is that pacemaker tries
to start up the exportfs resource immediately. This fails, because
what it's exporting doesn't exist until after gfs2 runs. Because
the cloned resource can't run on either node, the cluster goes into
a state in which one node is fenced, the other node refuses to run
anything.

Here's a quick snapshot I was able to take of the output of crm_mon
that shows the problem:

http://pastebin.com/CiZvS4Fh

This shows that pacemaker is still trying to start the exportfs
resources, before it has run the chain drbd-clvmd-gfs2.

Just to confirm the obvious, I have the ordering constraints in the
full configuration linked above (Admin is my DRBD resource):

order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start
order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone order
Gfs2_Before_Exports inf: Gfs2Clone ExportsClone

This is not the only time I've observed this behavior in pacemaker.
Here's a lengthy log file excerpt from the same time I took the
crm_mon snapshot:

http://pastebin.com/HwMUCmcX

I can see that other resources, the symlink ones in particular, are
being probed and started before the drbd Admin resource has a
chance to be promoted. In looking at the log file, it may help to
know that /mail and /var/nevis are gfs2 partitions that aren't
mounted until the Gfs2 resource starts.

So this isn't the first time I've seen this happen. This is just
the first time I've been able to reproduce this reliably and
capture a snapshot.

Any ideas?



--
Bill Seligman | mailto://selig...@nevis.columbia.edu
Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/
PO Box 137|
Irvington NY 10533  USA   | Phone: (914) 591-2823



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-03 Thread Florian Haas
On Sat, Mar 3, 2012 at 6:55 PM, William Seligman
selig...@nevis.columbia.edu wrote:
 On 3/3/12 12:03 PM, emmanuel segura wrote:

 are you sure the exportfs agent can be use it with clone active/active?


 a) I've been through the script. If there's some problem associated with it
 being cloned, I haven't seen it. (It can't handle globally-unique=true,
 but I didn't turn that on.)

It shouldn't have a problem with being cloned. Obviously, cloning that
RA _really_ makes sense only with the export that manages an NFSv4
virtual root (fsid=0). Otherwise, the export clone has to be hosted on
a clustered filesystem, and you'd have to have a pNFS implementation
that doesn't suck (tough to come by on Linux), and if you want that
sort of replicate, parallel-access NFS you might as well use Gluster.
The downside of the latter, though, is it's currently NFSv3-only,
without sideband locking.

 b) I had similar problems using the exportfs resource in a primary-secondary
 setup without clones.

 Why would a resource being cloned create an ordering problem? I haven't set
 the interleave parameter (even with the documentation I'm not sure what it
 does) but A before B before C seems pretty clear, even for cloned resources.

As far as what interleave does. Suppose you have two clones, A and B.
And they're linked with an order constraint, like this:

order A_before_B inf: A B

... then if interleave is false, _all_ instances of A must be started
before _any_ instance of B gets to start anywhere in the cluster.
However if interleave is true, then for any node only the _local_
instance of A needs to be started before it can start the
corresponding _local_ instance of B.

In other words, interleave=true is actually the reasonable thing to
set on all clone instances by default, and I believe the pengine
actually does use a default of interleave=true on defined clone sets
since some 1.1.x release (I don't recall which).

Bill, seeing as you've already pastebinned your config and crm_mon
output, could you also pastebin your whole CIB as per cibadmin -Q
output? Thanks.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-03 Thread William Seligman

On 3/3/12 2:14 PM, Florian Haas wrote:

On Sat, Mar 3, 2012 at 6:55 PM, William Seligman
selig...@nevis.columbia.edu  wrote:

On 3/3/12 12:03 PM, emmanuel segura wrote:


are you sure the exportfs agent can be use it with clone active/active?


a) I've been through the script. If there's some problem associated with it
being cloned, I haven't seen it. (It can't handle globally-unique=true,
but I didn't turn that on.)


It shouldn't have a problem with being cloned. Obviously, cloning that
RA _really_ makes sense only with the export that manages an NFSv4
virtual root (fsid=0). Otherwise, the export clone has to be hosted on
a clustered filesystem, and you'd have to have a pNFS implementation
that doesn't suck (tough to come by on Linux), and if you want that
sort of replicate, parallel-access NFS you might as well use Gluster.
The downside of the latter, though, is it's currently NFSv3-only,
without sideband locking.


I'll look this over when I have a chance. I think I can get away without 
a NFSv4 virtual root because I'm exporting everything to my cluster 
either read-only, or only one system at a time will do any writing. Now 
that you've warned me, I'll do some more checking.



b) I had similar problems using the exportfs resource in a primary-secondary
setup without clones.

Why would a resource being cloned create an ordering problem? I haven't set
the interleave parameter (even with the documentation I'm not sure what it
does) but A before B before C seems pretty clear, even for cloned resources.


As far as what interleave does. Suppose you have two clones, A and B.
And they're linked with an order constraint, like this:

order A_before_B inf: A B

... then if interleave is false, _all_ instances of A must be started
before _any_ instance of B gets to start anywhere in the cluster.
However if interleave is true, then for any node only the _local_
instance of A needs to be started before it can start the
corresponding _local_ instance of B.

In other words, interleave=true is actually the reasonable thing to
set on all clone instances by default, and I believe the pengine
actually does use a default of interleave=true on defined clone sets
since some 1.1.x release (I don't recall which).


Thanks, Florian. That's a great explanation. I'll probably stick 
interleave=true on most of my clones just to make sure.


It explains an error message I've seen in the logs:

Mar  2 18:15:19 hypatia-tb pengine: [4414]: ERROR: 
clone_rsc_colocation_rh: Cannot interleave clone ClusterIPClone and 
Gfs2Clone because they do not support the same number of resources per node


Because ClusterIPClone has globally-unique=true and clone-max=2, it's 
possible for both instances to be running on a single node; I've seen 
this a few times in my testing when cycling power on one of the nodes. 
Interleaving doesn't make sense in such a case.



Bill, seeing as you've already pastebinned your config and crm_mon
output, could you also pastebin your whole CIB as per cibadmin -Q
output? Thanks.


Sure: http://pastebin.com/pjSJ79H6. It doesn't have the exportfs 
resources in it; I took them out before leaving for the weekend. If it 
helps, I'll put them back in and try to get the cibadmin -Q output 
before any nodes crash.


--
Bill Seligman | mailto://selig...@nevis.columbia.edu
Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/
PO Box 137|
Irvington NY 10533  USA   | Phone: (914) 591-2823



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Apparent problem in pacemaker ordering

2012-03-02 Thread William Seligman
One step forward, two steps back.

I'm working on a two-node primary-primary cluster. I'm debugging problems I have
with the ocf:heartbeat:exportfs resource. For some reason, pacemaker sometimes
appears to ignore ordering I put on the resources.

Florian Haas recommended pastebin in another thread, so let's give it a try.
Here's my complete current output of crm configure show:

http://pastebin.com/bbSsqyeu

Here's a quick sketch: The sequence of events is supposed to be DRBD (ms) -
clvmd (clone) - gfs2 (clone) - exportfs (clone).

But that's not what happens. What happens is that pacemaker tries to start up
the exportfs resource immediately. This fails, because what it's exporting
doesn't exist until after gfs2 runs. Because the cloned resource can't run on
either node, the cluster goes into a state in which one node is fenced, the
other node refuses to run anything.

Here's a quick snapshot I was able to take of the output of crm_mon that shows
the problem:

http://pastebin.com/CiZvS4Fh

This shows that pacemaker is still trying to start the exportfs resources,
before it has run the chain drbd-clvmd-gfs2.

Just to confirm the obvious, I have the ordering constraints in the full
configuration linked above (Admin is my DRBD resource):

order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start
order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone
order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone

This is not the only time I've observed this behavior in pacemaker. Here's a
lengthy log file excerpt from the same time I took the crm_mon snapshot:

http://pastebin.com/HwMUCmcX

I can see that other resources, the symlink ones in particular, are being probed
and started before the drbd Admin resource has a chance to be promoted. In
looking at the log file, it may help to know that /mail and /var/nevis are gfs2
partitions that aren't mounted until the Gfs2 resource starts.

So this isn't the first time I've seen this happen. This is just the first time
I've been able to reproduce this reliably and capture a snapshot.

Any ideas?
-- 
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
PO Box 137|
Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-02 Thread William Seligman
Darn it, forgot versions:

Redhat Linux 6.2 (kernel 2.6.32)
cman-3.0.12.1
corosync-1.4.1
pacemaker-1.1.6

On 3/2/12 6:12 PM, William Seligman wrote:
 One step forward, two steps back.
 
 I'm working on a two-node primary-primary cluster. I'm debugging problems I 
 have
 with the ocf:heartbeat:exportfs resource. For some reason, pacemaker sometimes
 appears to ignore ordering I put on the resources.
 
 Florian Haas recommended pastebin in another thread, so let's give it a try.
 Here's my complete current output of crm configure show:
 
 http://pastebin.com/bbSsqyeu
 
 Here's a quick sketch: The sequence of events is supposed to be DRBD (ms) -
 clvmd (clone) - gfs2 (clone) - exportfs (clone).
 
 But that's not what happens. What happens is that pacemaker tries to start up
 the exportfs resource immediately. This fails, because what it's exporting
 doesn't exist until after gfs2 runs. Because the cloned resource can't run on
 either node, the cluster goes into a state in which one node is fenced, the
 other node refuses to run anything.
 
 Here's a quick snapshot I was able to take of the output of crm_mon that shows
 the problem:
 
 http://pastebin.com/CiZvS4Fh
 
 This shows that pacemaker is still trying to start the exportfs resources,
 before it has run the chain drbd-clvmd-gfs2.
 
 Just to confirm the obvious, I have the ordering constraints in the full
 configuration linked above (Admin is my DRBD resource):
 
 order Admin_Before_Clvmd inf: AdminClone:promote ClvmdClone:start
 order Clvmd_Before_Gfs2 inf: ClvmdClone Gfs2Clone
 order Gfs2_Before_Exports inf: Gfs2Clone ExportsClone
 
 This is not the only time I've observed this behavior in pacemaker. Here's a
 lengthy log file excerpt from the same time I took the crm_mon snapshot:
 
 http://pastebin.com/HwMUCmcX
 
 I can see that other resources, the symlink ones in particular, are being 
 probed
 and started before the drbd Admin resource has a chance to be promoted. In
 looking at the log file, it may help to know that /mail and /var/nevis are 
 gfs2
 partitions that aren't mounted until the Gfs2 resource starts.
 
 So this isn't the first time I've seen this happen. This is just the first 
 time
 I've been able to reproduce this reliably and capture a snapshot.
 
 Any ideas?


-- 
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
PO Box 137|
Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems