Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Stephan Budach Thu, 18 Feb 2016 22:13:03 -0800

Am 18.02.16 um 21:57 schrieb Schweiss, Chip:

On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen <m...@miras.org<mailto:m...@miras.org>> wrote:
    On Thu, 18 Feb 2016 07:13:36 +0100
    Stephan Budach <stephan.bud...@jvm.de
    <mailto:stephan.bud...@jvm.de>> wrote:

    >
    > So, when I issue a simple ls -l on the folder of the vdisks,
    while the switchover is happening, the command somtimes comcludes
    in 18 to 20 seconds, but sometime ls will just sit there for minutes.
    >
    This is a known limitation in NFS. NFS was never intended to be
    clustered so what you experience is the NFS process on the client side
    keeps kernel locks for the now unavailable NFS server and any request
    to the process hangs waiting for these locks to be resolved. This can
    be compared to a situation where you hot-swap a drive in the pool
    without notifying the pool.

    Only way to resolve this is to forcefully kill all NFS client
    processes
    and the restart the NFS client.
I've been running RSF-1 on OmniOS since about r151008. All my clientshave always been NFSv3 and NFSv4.
My memory is a bit fuzzy, but when I first started testing RSF-1,OmniOS still had the Sun lock manager which was later replaced withthe BSD lock manager. This has had many difficulties.
I do remember that fail overs when I first started with RSF-1 neverhad these stalls, I believe this was because the lock state was storedin the pool and the server taking over the pool would inherit thatstate too. That state is now lost when a pool is imported with theBSD lock manager.
When I did testing I would do both full speed reading and writing tothe pool and force fail overs, both by command line and by killingpower on the active server. Never did I have a fail over take morethan about 30 seconds for NFS to fully resume data flow.
Others who know more about the BSD lock manager vs the old Sun lockmanager may be able to tell us more. I'd also be curious if Nexentahas addressed this.
-Chip

I actually don't know, if it's the lock manager or the nfsd itself, thatcaused this, but as I bounced all of them after I failed the ZPOOL overwhile hammering it with reads and writes, lockd would also have beenpart of the processes that had been restarted. And remeber, this onlyhappend when failing from to and back one host in a rather quick manner.

Nevertheless, RSF-1 seems to be a solid solution and I will very likelyimplement it across several OmniOS boxes.


Cheers,
Stephan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Reply via email to