Title: RE: [LARTC] Problem with Load Balancing

Vlad,

We have also set up a somewhat similar method of load balancing.  Our traffic is never a 50-50 split (well 3:2 is how we have it set, but it doesn't always get close to that), but as the load picks up, it tends to be closer to the actual amount.

Dead gateway detection has never worked for us, and one day I'll probably bother other members of the LARTC group to get some help, but the method that we use is to check the output of the ip neighbor command.  Basically, if our two ISPs are 10.1.1.254 and 10.2.2.254, we run a bash script via cron every minute that does a call something like:

ETH1 = ip neigh 10.1.1.254 | egrep "REACHABLE|DELAY|PROBE|STALE" -c
ETH2 = ip neigh 10.2.2.254 | egrep "REACHABLE|DELAY|PROBE|STALE" -c

The neighbor system basically monitors ARP and if it sees a message leave an interface without a reply after something like 3-5 seconds, it moves the interface to DELAY, after another few seconds it moves to PROBE and does an active arp request, and if that fails to work in a few seconds, it becomes INCOMPLETE or FAILED or just simply isn't listed.  If no data is sent either way for a while, the entry can be marked STALE or removed.

With the above lines, we get a 1 in the ETH1 or ETH2 variables if the next neighbor is up, and a 0 if not.  From there you can use some if scripts to detect if both are up, or if only one is up, which one.  In our case, if both are up we clear the default route and then make it something like

ip route add default nexthop via 10.1.1.254 dev eth1 weight 1 \
nexthop via 10.2.2.254 dev eth2 weight 1

and if only one is up we clear it and make it :

ip route add default nexthop via 10.1.1.254 dev eth1
or
ip route add default nexthop via 10.2.2.254 dev eth2


With some additional scripting we can allow this to be overridden, we can set the link to prefer using only one line, but switch to the other if the preferred line fails, and to take input from programs like Nagios to auto-prefer one line or another if ping times get high, etc.  In addition, the script remembers the state it was in (so that it only changes the routing table when needed), controls DNS, can flush the DNS cache, and reports status back to Nagios.  Once I get all the bugs out and some documentation, I'd be happy to post it to the news group, though you or anyone else can send me an email if you would like to take a look at it before then.

In practice, this method usually detects and adjusts outbound connections quickly without user intervention; DNS changes with short TTLS take care of inbound connections.  Just be careful... if you don't have something sending traffic out to your upstream routers (and back) every few minutes, the entry in your ARP table can potentially be removed and thus cause your system to think an unused gateway has failed, or that a recovered gateway is still down.  This could be checked with a quick "if ip neigh test fails, ping neighbor 5 times, then test again before making decisions".  Running an uptime monitor that pings or does something else to/through the gateway (regardless of default route) also takes care of this.


-Will

-----Original Message-----
From: Vladimir Burciaga Aguilar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 14, 2006 10:25 PM
To: lartc@mailman.ds9a.nl
Subject: [LARTC] Problem with Load Balancing

Hi everybody!

I'm trying to implement the load balancing for a LAN with two ISPs. I've
installed a Suse Linux Enterpise Server 9 with iproute2 for that porpouse.

The server have two NICs, one of them is for both the LAN and ISP 1. I've
setup both NICs with YAST (if I use ip for this, then the whole thing
doesn't work!) and execute the following commands to setup the routing
tables:

ip route flush cache
ip route flush default
ip route flush table 1
ip route flush table 2

[snip]

_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

Reply via email to