On Thu, Nov 26, 2009 at 03:56:37PM +0100, Henning Brauer wrote:

> * Derek Buttineau <de...@csolve.net> [2009-11-26 15:07]:
> > On 2009-11-25, at 6:23 PM, Henning Brauer wrote:
> > 
> > > check ifconfig -g carp on both
> > 
> > 
> > Right now both are at:
> > 
> > carp: carp demote count 0
> > 
> > However, I did check that before I rebooted the backup unit and the master 
> > was
> > set to
> > 
> > carp: carp demote count 1
> > 
> > At first I thought that maybe pfsync was keeping the master from reverting
> > while it synced state, but even after 24 hours the master hadn't taken back
> > over from the slave.
> 
> the one with the higher demote count always loses, regardless of
> advskew. now finding out which subsytem set the demote count might be
> nintrivial. pfsync is in the game, so is rc, and, depending on
> configuration, various daemons like bgpd and ospfd.

What I have observed on a 4.6 firewall pair:

Thge demote count stays on 1 for a while because the first bulk state
update request times out. Only the subsequent one succeeds. The timeout
is 20s by default, but grows if you have a larger max state number. 

The analysis is that the pfsync code triggers a bulk request on
the BSIOCSETPFSYNC ioctl, but at that moment the interface is not yet
up, the SIOCSIFFLAGS is done after that.

This happens if you have a line in hostname.pfsync0 like:

        up syncif itf0

This gets rewritten by /etc/netstart, moving the "up" to the end.

A workaround (until dlg@ or somebody else finds a real fix) is to have
a newline after "up", so that two ifconfig commands are issued by
netstart, one to up the interface, and next to set the syncif:

        up
        syncif itf0


        -Otto

Reply via email to