Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Stephan Budach Wed, 17 Feb 2016 23:50:16 -0800

Hi Michael,

Am 18.02.16 um 08:17 schrieb Michael Talbott:

While I don't have a setup like you've described, I'm going to take a wild 
guess and say check your switches (and servers) ARP tables. Perhaps the switch 
isn't updating your VIP address with the other servers MAC address fast enough. 
Maybe as part of the failover script, throw a command to your switch to update 
the ARP entry or clear its ARP table. Another perhaps simpler solution / 
diagnostic you could do is record a ping output of the server to your router 
via the vip interface and address right after the failover process to try and 
tickle the switch to update its mac table. Also it's possible the clients might 
need an ARP flush too.


If this is the case, another possibility is you could have both servers spoof 
the same MAC address and only ever have one up at a time and have them 
controlled by the failover script (or bad things will happen).

Just a thought.

Michael
Sent from my iPhone

On Feb 17, 2016, at 10:13 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I have been test driving RSF-1 for the last week to accomplish the following:

- cluster a zpool, that is made up from 8 mirrored vdevs, which are based on 8 
x 2 SSD mirrors via iSCSI from another OmniOS box
- export a nfs share from above zpool via a vip
- have RSF-1 provide the fail-over and vip-moving
- use the nfs share as a repository for my Oracle VM guests and vdisks

The setup seems to work fine, but I do have one issue, I can't seem to get solved. 
Whenever I failover the zpool, any inflight nfs data, will be stalled for some 
unpredictable time. Sometimes it takes not much longer than the "move" time of 
the resources but sometimes it takes up to 5 mins. until the nfs client on my VM server 
becomes alive again.

So, when I issue a simple ls -l on the folder of the vdisks, while the 
switchover is happening, the command somtimes comcludes in 18 to 20 seconds, 
but sometime ls will just sit there for minutes.

I wonder, if there's anything, I could do about that. I have already played 
with several timeouts, nfs wise and tcp wise, but nothing seem to yield any 
effect on this issue. Anyone, who knows some tricks to speed up the inflight 
data?

Thanks,
Stephan

I don't think that the switches are the problem, since when I ping thevip from the VM host (OL6 based), then the ping only ceases for the timeit takes RSF-1 to move the services and afterwards the pings continuejust normally. The only thing I wonder is, if it's more of a NFS or atcp-in-general thing. Maybe I should also test some other IP protocol tosee, if that one stalls as well for that long of a time.


Cheers,
Stephan
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Reply via email to