Re: pf/carp for redundant production use

Neil Mon, 26 Sep 2005 07:11:19 -0700

Hi Jason,I would like to try your #1 suggestion but unfortunately, I don't know whereto start. What are the programs I need? What configuration? Is there anyexisting sample configuration on a link that I can follow?Thanks for explaining this in very detail.NeilJason Dixon writes:

On Sep 25, 2005, at 8:30 AM, Neil wrote:
Yep, the same behavior when the master dies. The solution that theperson in #pf told me is use routing but I don't know how to implement.He told me that it's an issue in pf's NAT.
Bullshit.Ok, here is the layman's description of the problem and the practicalsolution(s) to it. I'd love to be able to explain why interfacesrecovering from INIT don't reclaim MASTER faster than they do (approx 30seconds in my tests), but I don't understand the code-level logistics ofeverything. Hint: This is only a problem using single CARP hosts withpreemption.PROBLEM:With a simple CARP design using a single CARP host on each segment andpreemption enabled, failover occurs as expected in the case of any systemoffline condition (server crashes, admin reboots, etc). If a singleinterface goes from MASTER to INIT state (cable gets pulled, cable goesbad, card goes bad, etc), the 2nd interface on that system will go intoBACKUP mode as expected. Traffic will route across the new MASTER, andwill continue to do so while the failed system is in an INIT/BACKUPstate.However, if the failed interface returns from INIT to an available mode(we plug the cable in), we notice that the 2nd interface reclaims MASTERalmost immediately, but the restored interface does not. It becomes aBACKUP host, which leaves us with a routing impossibility:
BACKUP   MASTER
   carp0         carp0
      |                 |
   host1         host2
      |                 |
   carp1         carp1
MASTER BACKUPAny internal clients will attempt to send traffic through the "newgateway" (host1), although neither system has any way of routing thetraffic properly (not without some hokey static routes bypassing the CARPhosts). NOTE: I have found that the original MASTER does indeed returnto the correct state, approximately 30 seconds later. This isreproducible, but YMMV.SOLUTION:1) If you really are concerned about a partial system failure (unpluggedcable, bad card, etc), then scrap the single CARP host/ segment design anduse arpbalance with multiple CARP hosts. The same partial-failure testusing 2 CARP hosts on each segment with arpbalance resulted in a perfectfailover and recovery with no packet loss.2) This is not tested, but I suspect that you should be able to use thenew interface grouping features in 3.8 to simply assign multiple physicalinterfaces to the same group. Even if one fails, the other *should*maintain the MASTER state and avoid any partial failure consequences.I'd love to hear from other users or developers that have tried thegrouping feature in this sort of scenario.
--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net

Re: pf/carp for redundant production use

Reply via email to