Re: [Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

2014-07-22 Thread Ulrich Windl
 Charles Taylor chas...@ufl.edu schrieb am 21.07.2014 um 16:40 in Nachricht
67c40c19-6146-4be5-8fbd-4bba114c6...@ufl.edu:

 On Jul 21, 2014, at 8:57 AM, Ulrich Windl wrote:
 
 Charles Taylor chas...@ufl.edu schrieb am 17.07.2014 um 17:24 in 
 Nachricht
 761ce39a-57d8-47d2-860d-2af1936cc...@ufl.edu:
 I feel like this is something that must have been covered extensively 
 already 
 but I've done a lot of googling, looked at a lot of cluster configs, but 
 have 
 not found the solution.
 
 I have an HA NFS cluster (corosync+pacemaker).  The relevant rpms are 
 listed 
 
 below but I'm not sure they are that important to the question which is 
 this...
 
 When performing managed failovers of the NFS-exported file system resource 
 from one node to the other (crm resource move), any active NFS clients 
 experience an I/O error when the file system is unexported.  In other 
 words, 
 
 you must unexport it to unmount it.  As soon as it is unexported, clients 
 are 
 no longer able to write to it and experience an I/O error (rather than just 
 blocking).
 
 Do you hard-mount or soft-mount NFS? Do you use NFSv3 or NFSv4?
 
 Hard mounts.  We are supporting both NFSv3 and NFSv4 mounts.  I tested both 
 and the behavior was the same.   There seemed to be no way to avoid and I/O 
 error on the clients when umounting the file system as part of a managed (crm 
 resource move) failover.   I'm wondering if this is expected or if there is 
 some way around it that I'm simply missing.  We'd like to be able to move 
 resources back and forth among the servers for maintenance without disrupting 
 client I/O.
 
 Just to summarize, the Filesystem agent must umount the volume to migrate 
 it.   To successfully umount it, umount requires the volume to be unexported. 
   As soon as the stop operation is run by the exportfs agent, any clients 
 actively doing I/O are interrupted and error out rather than blocking as they 
 would if the server went down.   So far, I've been unable to find a way 
 around this.

You need the NFS server to unexport, but you can remove the IP adresse for the 
NFS service before unexporting. That way the clients see a down server. 
Obviously start in the opposite order (first export, then add the IP address). 
If you think about it, that seems logical...

 
 As I write this, I'm thinking that perhaps the way to achieve this is to 
 change the order of the services so that the VIP is started last and stopped 
 first when stopping/starting the resource group.   That should make it appear 
 to the client that the server just went away as would happen in a failure 
 scenario.   Then the client should not know that the file system has been 
 unexported since it can't talk to the server.   
 
 Perhaps, I just made a rookie mistake in the ordering of the services within 
 the resource group.  I'll try that and report back.
 
 Regards,
 
 Charlie
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org/ReportingProblems 



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

2014-07-22 Thread Charles Taylor

On Jul 21, 2014, at 10:40 AM, Charles Taylor wrote:

 As I write this, I'm thinking that perhaps the way to achieve this is to 
 change the order of the services so that the VIP is started last and stopped 
 first when stopping/starting the resource group.   That should make it appear 
 to the client that the server just went away as would happen in a failure 
 scenario.   Then the client should not know that the file system has been 
 unexported since it can't talk to the server.   
 
 Perhaps, I just made a rookie mistake in the ordering of the services within 
 the resource group.  I'll try that and report back.

Yep, this was my mistake.   The IPaddr2 primitive needs to follow the exportfs 
primitives in my resource group so they now are arranged as

Resource Group: grp_b3v0
 vg_b3v0(ocf::heartbeat:LVM) Started 
 fs_b3v0(ocf::heartbeat:Filesystem) Started 
 ex_b3v0_1  (ocf::heartbeat:exportfs) Started 
 ex_b3v0_2  (ocf::heartbeat:exportfs) Started 
 ex_b3v0_3  (ocf::heartbeat:exportfs) Started 
 ex_b3v0_4  (ocf::heartbeat:exportfs) Started 
 ex_b3v0_5  (ocf::heartbeat:exportfs) Started 
 ex_b3v0_6  (ocf::heartbeat:exportfs) Started 
 ip_vbio3   (ocf::heartbeat:IPaddr2) Started 

Thanks to those who responded,

Charlie Taylor
UF Research Computing



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

2014-07-22 Thread emmanuel segura
but the nfs failover works now?

2014-07-22 2:10 GMT+02:00 Charles Taylor chas...@ufl.edu:

 On Jul 21, 2014, at 10:40 AM, Charles Taylor wrote:

 As I write this, I'm thinking that perhaps the way to achieve this is to 
 change the order of the services so that the VIP is started last and stopped 
 first when stopping/starting the resource group.   That should make it 
 appear to the client that the server just went away as would happen in a 
 failure scenario.   Then the client should not know that the file system has 
 been unexported since it can't talk to the server.

 Perhaps, I just made a rookie mistake in the ordering of the services within 
 the resource group.  I'll try that and report back.

 Yep, this was my mistake.   The IPaddr2 primitive needs to follow the 
 exportfs primitives in my resource group so they now are arranged as

 Resource Group: grp_b3v0
  vg_b3v0(ocf::heartbeat:LVM) Started
  fs_b3v0(ocf::heartbeat:Filesystem) Started
  ex_b3v0_1  (ocf::heartbeat:exportfs) Started
  ex_b3v0_2  (ocf::heartbeat:exportfs) Started
  ex_b3v0_3  (ocf::heartbeat:exportfs) Started
  ex_b3v0_4  (ocf::heartbeat:exportfs) Started
  ex_b3v0_5  (ocf::heartbeat:exportfs) Started
  ex_b3v0_6  (ocf::heartbeat:exportfs) Started
  ip_vbio3   (ocf::heartbeat:IPaddr2) Started

 Thanks to those who responded,

 Charlie Taylor
 UF Research Computing



 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

2014-07-22 Thread Dmitri Maziuk

On 7/22/2014 1:00 AM, Ulrich Windl wrote:


You need the NFS server to unexport, but you can remove the IP
adresse  for the NFS service before unexporting. That way the clients see a down

server. Obviously start in the opposite order (first export, then add
the IP address). If you think about it, that seems logical...

Erm... bull, pardon my french.

As of NFS v.3 if an export disappears while in use, the client errors 
out with 'stale NFS handle'. It does not matter whether you unexport 
first or turn off the power in your server room first.


(No idea -- and don't really care -- what NFS v.4 does.)

Dima

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

2014-07-21 Thread Ulrich Windl
 Charles Taylor chas...@ufl.edu schrieb am 17.07.2014 um 17:24 in Nachricht
761ce39a-57d8-47d2-860d-2af1936cc...@ufl.edu:
 I feel like this is something that must have been covered extensively already 
 but I've done a lot of googling, looked at a lot of cluster configs, but have 
 not found the solution.
 
 I have an HA NFS cluster (corosync+pacemaker).  The relevant rpms are listed 
 below but I'm not sure they are that important to the question which is 
 this...
 
 When performing managed failovers of the NFS-exported file system resource 
 from one node to the other (crm resource move), any active NFS clients 
 experience an I/O error when the file system is unexported.  In other words, 
 you must unexport it to unmount it.  As soon as it is unexported, clients are 
 no longer able to write to it and experience an I/O error (rather than just 
 blocking).

Do you hard-mount or soft-mount NFS? Do you use NFSv3 or NFSv4?

 
 In a failure scenario this is not a problem becuase the file system is never 
 unexported on the primary server.  Rather the server just goes down, the 
 secondary takes over the resources and client I/O blocks until the process is 
 complete and then goes about its business.   We would like this same behavior 
 for a *managed* failover but have not found a mount or export option/scenario 
 that works.   Is it possible?  What am I missing?
 
 I realize this is more of an nfs/exportfs question but I would think that 
 those implementing NFS HA clusters would be familiar with the scenario I'm 
 describing.
 
 Regards,
 
 Charlie Taylor
 
 pacemaker-cluster-libs-1.1.7-6.el6.x86_64
 pacemaker-cli-1.1.7-6.el6.x86_64
 pacemaker-1.1.7-6.el6.x86_64
 pacemaker-libs-1.1.7-6.el6.x86_64
 resource-agents-3.9.2-40.el6.x86_64
 fence-agents-3.1.5-35.el6.x86_64
 
 Red Hat Enterprise Linux Server release 6.3 (Santiago)
 
 Linux biostor3.ufhpc 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 
 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
 
 [root@biostor4 bs34]# crm status
 
 Last updated: Thu Jul 17 10:55:04 2014
 Last change: Thu Jul 17 07:59:47 2014 via crmd on biostor3.ufhpc
 Stack: openais
 Current DC: biostor3.ufhpc - partition with quorum
 Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
 2 Nodes configured, 2 expected votes
 20 Resources configured.
 
 
 Online: [ biostor3.ufhpc biostor4.ufhpc ]
 
  Resource Group: grp_b3v0
  vg_b3v0  (ocf::heartbeat:LVM):   Started biostor3.ufhpc
  fs_b3v0  (ocf::heartbeat:Filesystem):Started biostor3.ufhpc
  ip_vbio3 (ocf::heartbeat:IPaddr2):   Started biostor3.ufhpc
  ex_b3v0_1(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_2(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_3(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_4(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_5(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  Resource Group: grp_b4v0
  vg_b4v0  (ocf::heartbeat:LVM):   Started biostor4.ufhpc
  fs_b4v0  (ocf::heartbeat:Filesystem):Started biostor4.ufhpc
  ip_vbio4 (ocf::heartbeat:IPaddr2):   Started biostor4.ufhpc
  ex_b4v0_1(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_2(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_3(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_4(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_5(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  st_bio3  (stonith:fence_ipmilan):Started biostor4.ufhpc
  st_bio4  (stonith:fence_ipmilan):Started biostor3.ufhpc
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org/ReportingProblems 



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

2014-07-21 Thread Charles Taylor

On Jul 21, 2014, at 8:57 AM, Ulrich Windl wrote:

 Charles Taylor chas...@ufl.edu schrieb am 17.07.2014 um 17:24 in 
 Nachricht
 761ce39a-57d8-47d2-860d-2af1936cc...@ufl.edu:
 I feel like this is something that must have been covered extensively 
 already 
 but I've done a lot of googling, looked at a lot of cluster configs, but 
 have 
 not found the solution.
 
 I have an HA NFS cluster (corosync+pacemaker).  The relevant rpms are listed 
 below but I'm not sure they are that important to the question which is 
 this...
 
 When performing managed failovers of the NFS-exported file system resource 
 from one node to the other (crm resource move), any active NFS clients 
 experience an I/O error when the file system is unexported.  In other words, 
 you must unexport it to unmount it.  As soon as it is unexported, clients 
 are 
 no longer able to write to it and experience an I/O error (rather than just 
 blocking).
 
 Do you hard-mount or soft-mount NFS? Do you use NFSv3 or NFSv4?

Hard mounts.  We are supporting both NFSv3 and NFSv4 mounts.  I tested both and 
the behavior was the same.   There seemed to be no way to avoid and I/O error 
on the clients when umounting the file system as part of a managed (crm 
resource move) failover.   I'm wondering if this is expected or if there is 
some way around it that I'm simply missing.  We'd like to be able to move 
resources back and forth among the servers for maintenance without disrupting 
client I/O.

Just to summarize, the Filesystem agent must umount the volume to migrate it.   
To successfully umount it, umount requires the volume to be unexported.   As 
soon as the stop operation is run by the exportfs agent, any clients actively 
doing I/O are interrupted and error out rather than blocking as they would if 
the server went down.   So far, I've been unable to find a way around this.

As I write this, I'm thinking that perhaps the way to achieve this is to change 
the order of the services so that the VIP is started last and stopped first 
when stopping/starting the resource group.   That should make it appear to the 
client that the server just went away as would happen in a failure scenario.  
 Then the client should not know that the file system has been unexported since 
it can't talk to the server.   

Perhaps, I just made a rookie mistake in the ordering of the services within 
the resource group.  I'll try that and report back.

Regards,

Charlie

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems