Re: [Linux-HA] NFSv4 with Heartbeat and DRBD
On 07-02-2011 17:58, Dimitri Maziuk wrote: > Dave Dykstra wrote: >> From the old linux-ha.org/HaNFS page, Hint #2: >> If your kernel defaults to using TCP for NFS (as is the case in 2.6 >> kernels), switch to UDP instead by using the 'udp' mount option. If >> you don't do this, you won't be able to quickly switch from server >> "A" to "B" and back to "A" because "A" will hold the TCP connection >> in TIME_WAIT state for 15-20 minutes and refuse to reconnect. > This is when flipping back to "A" right away. If you don't do that, tcp > is fine. > > I have NFS v3 mounts on heartbeat R1 (2.1.4) here with > noatime,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2 > (all defaults) and they don't take a minute to failover. (Initial > boot-up is another story.) Even during he fail-back case it should be possible to mitigate that effect. With NFSv3 we have a very good experience (closer to 20s) both with TCP and UDP. With NFSv4, even messing with the lease/grace times the client can't write in less than a minute or two. :( > So it's either NFS v4 or crm (resource agents?) or both. We are inclined to blame NFS v4, but the costumer is adamant on this point. If anyone has good experiences regarding fail-over with NFSv4 we would *really* appreciate sharing the setup. > (Ballpark figure for timeouts at various levels of the network stack is > 45 sec -- or used to be back when I did my networking 101 -- and if you > want to lower them you better know what you're doing.) Well, there are some things that really help here, like gratuitous ARP :) -- ServiSMART Ricardo Sousa servimos o seu negócio tel: +351 96 298 0989 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] NFSv4 with Heartbeat and DRBD
Dave Dykstra wrote: > From the old linux-ha.org/HaNFS page, Hint #2: > If your kernel defaults to using TCP for NFS (as is the case in 2.6 > kernels), switch to UDP instead by using the 'udp' mount option. If > you don't do this, you won't be able to quickly switch from server > "A" to "B" and back to "A" because "A" will hold the TCP connection > in TIME_WAIT state for 15-20 minutes and refuse to reconnect. This is when flipping back to "A" right away. If you don't do that, tcp is fine. I have NFS v3 mounts on heartbeat R1 (2.1.4) here with noatime,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2 (all defaults) and they don't take a minute to failover. (Initial boot-up is another story.) So it's either NFS v4 or crm (resource agents?) or both. >> We have implemented a solution based around heartbeat v3 and DRBD. >> While everything seems to work very well we have some difficulty with >> regard to the time it takes for the NFS service to become fully available. (Ballpark figure for timeouts at various levels of the network stack is 45 sec -- or used to be back when I did my networking 101 -- and if you want to lower them you better know what you're doing.) Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] NFSv4 with Heartbeat and DRBD
>From the old linux-ha.org/HaNFS page, Hint #2: If your kernel defaults to using TCP for NFS (as is the case in 2.6 kernels), switch to UDP instead by using the 'udp' mount option. If you don't do this, you won't be able to quickly switch from server "A" to "B" and back to "A" because "A" will hold the TCP connection in TIME_WAIT state for 15-20 minutes and refuse to reconnect. Are you mounting with TCP? A minute sounds short. - Dave On Fri, Feb 04, 2011 at 10:45:22AM +, Ricardo Botelho de Sousa wrote: > Hello All! > > We have implemented a solution based around heartbeat v3 and DRBD. > While everything seems to work very well we have some difficulty with > regard to the time it takes for the NFS service to become fully available. > > How long it is expected for a graceful fail-over with NFSv4 to take? > We tried reducing grace/lease times to no avail. We don't seem to be > able to lower it from about a minute. TCP or UDP doesn't seem to make > any difference. > > It's not that I believe this is related to Heartbeat, but perhaps find > an explanation or some mysterious parameter from your collective experience. > > Best regards, > > -- > ServiSMART Ricardo Sousa > servimos o seu neg?cio tel: +351 96 298 0989 > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems