On Wed, Dec 15, 2010 at 02:11:37PM +0000, Alistair Veitch wrote:
> Hi,
> 
> I've got an environment of three servers running Solaris 10 x64 u9. Two of
> these servers (let's call them "A" and "B") are clustered using Veritas
> Cluster Server, and they NFS-share a filesystem. The share has no options
> set, so it's "rw" to the world.
> 
> The third server (let's call it "C") NFS-mounts the share from the cluster
> via an entry in its dfstab. This mount also has no options set (although
> we've previously tried soft and hard mounts). The contents of
> /etc/default/nfs on this server has NFS_CLIENT_VERSMAX=3, otherwise all
> entries are default. We've also tried forcing NFS version 4 on the mount.
> 
> The problem is this: if we fail over the NFS share from A to B, the NFS mount
> on C hangs during the failover as expected. Once the NFS server components
> are up on B, the mount recovers automatically within a second or so, as
> expected. However, if we perform a failback of the share from B to A, the
> mount on C takes approx 7 to 8 minutes to recover automatically. During this
> period, C can ping A and B, and the virtual address and name associated with
> the failover NFS share. Further failovers/failbacks, if performed without
> large time intervals between them, also cause the long recovery of the NFS
> mount on C. If however we wait a good while (perhaps 30 mins or more) before
> performing another failover test, the recovery from the hanging NFS mount on
> C is quick again, back to one second or so.
> 
> Does anyone have any ideas why the MFS mount takes so long to recover from
> the hang under the scnenarios described above ?

Hi Alistair,

You could try to snoop the over-the-wire communication to see what exactly is
happening.


HTH.

-- 
Marcel Telka
RPE, Systems
_______________________________________________
nfs-discuss mailing list
[email protected]

Reply via email to