My 2 cents in dealing with DNS and your idea of the OPS feature. I have implemented the OPS feature into the 2.6 kernel and its running well. Without that feature, we wound up having all the DNS queries from our DNS client get sent to the same realserver.
The problem we did run into, which I've gotten help from the community on, is when using LVS-NAT, the source packet isn't SNAT'd. This is because LVS on the outgoing packet doesn't know the packet is an LVS packet, so it just forwards it out. I fixed this with an iptables rule to SNAT it myself. Just an FYI if you ever choose to use OPS with LVS-NAT. Mike -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon Horman Sent: Monday, May 07, 2007 10:31 PM To: Simon Pearce Cc: [email protected] Subject: Re: DNS problems solved On Fri, Apr 06, 2007 at 01:12:39PM +0200, Simon Pearce wrote: > > > Some of you on the list might remember my problem concerning our DNS cluster last year. > > > http://archive.linuxvirtualserver.org/html/lvs-users/2006-11/msg00278. > html > > > These problems (DNS timeouts) have continued throughout this year and > i have been desperately trying to find the solution. I have been > folowing the mailing list and stumbled over the probems Adrian Chapela > was having with his DNS setup. Which brought me to the solution > ipvsadm -L --timeout the default settings for UDP packets was set to > 500 seconds which should be changed. Which is way to long the load > balancers were waiting for 5 minutes to timeout a UDP packet i get > ablout 1500 queries a second. I changed the setting to 15 seconds last > week. And moved some of our old windows/bind DNS servers to the new > linux DNS cluster. Before i changed the timeout settings i always > recieved a call from our customers within two hours your DNS services > are not responding correctly. The IP's that refused to answer would > always change i have 254 IP's some of the large German dialup > providers would refuse to talk to us which resulted in domains not > being reachable. Our DNS cluster is autorative for about 250000 > domains so you can imagine how many complaints i recieved. I was about > to give up and scrap keepalived i am so glad i did not. Changing the > timeout value solved my problems and i am a happy man at the moment. > Is there a way to set the timeout value permently so it is saved after > a reboot of the server? One last thing i would like to say is a big > thank you to Graeme Fowler, Horms, Adrian Chapela and Alexandre Cassen > for writing this grat piece of software. and anyone else on the list > who maybe contributed to help me finaly find the solution. Thank you > guys you do a great job on the mailing list. Hi Simon, glad to hear that you got to the bottom of your problem. I am a little concerned about the idea of reducing UDP timeouts significantly because to be quite frank UDP load-balancing is a bit of a hack. The problem lies in the connectionless nature of the protocol, so natrually LVS has a devil of a time tracking UDP "connections" - that is a series of datagrams between a client and server that are really part of what would be a connection if TCP was being used. As UDP doesn't really have any state all LVS can do to identify such "connections" is to set up affinity based on the source and destination ip and port tuples. If my memory serves me correctly DNS quite often originates from port 53, and so if you are getting lots of requests from the same DNS server then this affinity heristic breaks down. The trouble is that if the timeout is significatnly reduced, the probablility of it breaking down the other way - in the case where that affinity is correct - increases. I'm not saying that you don't have a good case. Nor am I saying that changing the default timeout is off-limits. Just that what exactly is a good default timeout is a tricky question, because what works well in some cases will not work well in others, and vice versa. To some extent I wonder if the userspace tools should have the smarts to change the timeout if port 53 (DNS) is in use. Thought that may be an even worse heuristic. I wonder if a better idea might be the one packet scheduling patches by Julian. Much to my surprise these aren't merged. Perhaps thats my fault. I should look into it... http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00214.ht ml I also wonder, if problem relates to connection entries for servers that have been quiesced, then does setting expire_quiescent_template help? echo 1 > /proc/sys/net/ipv5/vs/expire_quiescent_template Sorry if those ideas have been canvased before, I only breifly refreshed my memory of the original thread. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ LinuxVirtualServer.org mailing list - [email protected] Send requests to [EMAIL PROTECTED] or go to http://www.in-addr.de/mailman/listinfo/lvs-users _______________________________________________ LinuxVirtualServer.org mailing list - [email protected] Send requests to [EMAIL PROTECTED] or go to http://www.in-addr.de/mailman/listinfo/lvs-users
