Karl Auer wrote: > DHCP failover and load-balancing are not simple *at all*.
As evidenced by the fact that the ISC fail-over protocol is horrible, and the implementation is almost as bad. Scratch that.. it's *terrible*. After toasting the leases once accidentally, I managed to prove to myself that this was a design feature. Let's say that the primary and secondary have the same configuration, and have synchronized leases. Stop the secondary, delete *all* leases, and bring it back up again. You can get into a state where the fail-over protocol does this: S: Send me the leases P: I did already! We're in sync! S: OK And the secondary has *zero* leases, and therefore wastes CPU cycles never handing out leases. WTF? I mean.. really. Is it that hard? Oh, and the server is O(N^2) in the number of leases. Why? Well... they don't use fancy concepts like "dynamically resizable hash tables". Fixed size hash tables were good enough in 1995, so they're good enough now, right? About 4 years ago I had a series of 200-400 line patches that would dramatically improve the performance of ISC. I got told that (1) it's impossible, and (2) if it was possible, it would require a drastic re-design of the server. When I told them that the patches were proven to work, and quoted *their own* code back at them showing where the patches could go, there was... nothing. They weren't malicious, they just had different priorities. > I would be very interested to hear how freeradius does it (or plans to > do it) hence my interest in the discussion. Are there any docs on how > freeradius implements DHCP? And especially how it implements failover? There are little to no documentation on how it does DHCP. And it doesn't do fail-over. Why? Fail-over is hard. My experience is that the fail-over protocol doesn't help. From what I recall the last time I looked at it, it was missing key things, like "transaction numbers". Hence the failure case noted above. The *correct* conversation should have been: S: I'm at transaction #0: I have no leases! P: Geez.. my last recollection is that you were at 1000. Let me send you all of the updates from 0 to where we are now: 1010. S: Thanks! It's really not that hard. Database books describe replication protocols. They look very different from the DHCP fail-over protocol. So... for now, DHCP in FreeRADIUS is still experimental. If you want to use it, see raddb/sites-available/dhcp. DHCP fail-over will be supported when there's a fail-over protocol that *works*. And for most enterprise sites, you *don't* need a fail-over protocol. Really. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html