On Fri, Dec 12, 2008 at 12:56:22PM +0000, Stephan A. Rickauer wrote:
> We have a simple two-node CARP cluster, each with three em(2)'s and one
> fxp0() interface. The setup runs fine since OpenBSD 3.7.
> 
> Being part of University Zurich our firewall has a 1GBit uplink to the
> central Uni infrastructure. Recently we have seen that utilizing this
> link heavily (e.g. when our Tivoli Storage Manager Client behind our
> firewall starts backing up "some" Gigabytes to Uni) both CARP interfaces
> of both nodes would go into MASTER state.
> 
> I could imagine that CARP advertisments are no longer sent and/or
> received 'in time' due to the heavy load so that the BACKUP believes it
> should become MASTER.
> 
> Wouldn't this be a general CARP problem under heavy load? And if so, how
> do people here deal with it? I was thinking of adding a simple
> priq-based ALTQ rule only for CARP. Does this make sense? Or would it be
> possible (theoretically) to send carp ads over a dedicated link?
> 
> (Almost) any comments welcome. ;)
> 

Welcome to the fine world of livelock and the problem timeouts are not run
on time. If your box enters livelock carp announcements from the master
may come late on the backup box so the backup is getting master but the
master does not notice that so you end up with two master systems.

There is some initial code in -current that tries to avoid the system
entering livelock for extended times. It needs a lot of testing so maybe
you should try it out and report back.

-- 
:wq Claudio

Reply via email to