On Thu, Nov 26, 2009 at 03:56:37PM +0100, Henning Brauer wrote: > * Derek Buttineau <de...@csolve.net> [2009-11-26 15:07]: > > On 2009-11-25, at 6:23 PM, Henning Brauer wrote: > > > > > check ifconfig -g carp on both > > > > > > Right now both are at: > > > > carp: carp demote count 0 > > > > However, I did check that before I rebooted the backup unit and the master > > was > > set to > > > > carp: carp demote count 1 > > > > At first I thought that maybe pfsync was keeping the master from reverting > > while it synced state, but even after 24 hours the master hadn't taken back > > over from the slave. > > the one with the higher demote count always loses, regardless of > advskew. now finding out which subsytem set the demote count might be > nintrivial. pfsync is in the game, so is rc, and, depending on > configuration, various daemons like bgpd and ospfd.
What I have observed on a 4.6 firewall pair: Thge demote count stays on 1 for a while because the first bulk state update request times out. Only the subsequent one succeeds. The timeout is 20s by default, but grows if you have a larger max state number. The analysis is that the pfsync code triggers a bulk request on the BSIOCSETPFSYNC ioctl, but at that moment the interface is not yet up, the SIOCSIFFLAGS is done after that. This happens if you have a line in hostname.pfsync0 like: up syncif itf0 This gets rewritten by /etc/netstart, moving the "up" to the end. A workaround (until dlg@ or somebody else finds a real fix) is to have a newline after "up", so that two ifconfig commands are issued by netstart, one to up the interface, and next to set the syncif: up syncif itf0 -Otto