On Sun, Jul 24 2011 at 27:21, David Gwynne wrote: > On 24/07/2011, at 8:27 PM, Jonathan Lassoff wrote: > > > On Wed, Apr 20, 2011 at 7:10 AM, David Gwynne <l...@animata.net> wrote: > >> > >> On 20/04/2011, at 11:08 PM, Jonathan Lassoff wrote: > >> > >>> On Wed, Apr 20, 2011 at 4:22 AM, David Gwynne <l...@animata.net> wrote: > >>>> you might be able to upgrade your passive firewall to 4.9 next to the > active 4.7 one. it looks like the protocol stayed the same so they should be > able to talk to each other. > >>> > >>> This would seem to be the case. > >>> > >>> This (http://undeadly.org/cgi?action=article&sid=20090301211402) is an > >>> absolutely excellent bit of writing about the improvements to pfsync, > >>> BTW. Thanks for letting that be shared. > >>> > >>>> however, it looks like bulk updates were broken in 4.7, which would > explain your failover problems. you can work around that by going "pfctl -S > /dev/stdout | ssh activefw pfctl -L /dev/stdin" as root on the passive fw. > >>> > >>> As an initial seeding of state? It seems to me that only some of my > >>> flows get affected when failing over (not everything is reset and > >>> traffic can still flow). > >> > >> yes. the pfctl commands will do a bulk update since the in kernel > implementation was unreliable back then. > >> > >>> It appears that both firewalls have an approximately congruent set of > >>> states, but usually a "pfctl -ss | wc -l" can be off by several > >>> hundred, to several thousand states at times. My hunch is that state > >>> creation and counter updates are not updated synchronously, so when > >>> failing over there are still some updates in-flight, and for flows > >>> that are moving their sequence numbers at a decent clip I could see > >>> why they might get reset. > >> > >> pf has a bit of fuzz when it does its tcp window matching, so packets can > get ahead of the firewall and be ok. > > > > Do you know if there is a way to see how much this fuzz is or if > > there's an offset? > > from memory its 1000 bytes. > > > If dropped for being out of a window, will (or can) it get logged to pflog? > > again, from memory its just dropped. > > >> i wrote defer, so yes... > >> > >> on my boxes the increase in latency is about .2 to .3ms. if a firewall is > missing its peer(s) it will go up to about 1/100th of a second. > > > > So does defer wait for a peer to acknowledge a new state just at the > > time of creation, or does it include state updates about sequence > > numbers as well? > > defer only delays the first packet. > > > I suspect I'm hitting a similar issue as you were with long-lived > > flows getting reset at failover. > > i think my problem is that i run both firewalls with the carp demotion counter > set low. when a box is rebooted the carp default is at 0 or 1, which means it > takes over traffic before it gets all the states. later code in rc.local > demotes it, but by that time some packets have been eaten by the new box. i > should fix it, but im lazy. > > >> thats exactly how i have my stuff configured. > > > > Have you ever had trouble when re-numbering an interface? It seems to > > me like ospfd doesn't pick up changes in interface numbering if > > changed out from under it. Most other OSPF daemons I use would pick > > this up as it changes, but as far I as can tell there's no way to tell > > ospfd to reload interface addressing. > > interfaces and addresses moving around hurts me too. > > > I'm often needing to add more and more interfaces and ospf interfaces, > > necessitating failing over so as to make it safe to kill and re-start > > ospfd -- in the process it just seems to nip some flows from flowing. > > i do that too. lets annoy claudio together!
In my world, it happends to change interface numbering. The solution we found is - remove the interface from ospfd.conf, - reload configuration with ospfctl reload - destroy the interface (our ospf interfaces are mainly gif ones), - recreate interface with new IPs - add conf to ospfd.conf, - reload configuration with ospfctl reload This may sound a bit too much, but it works and seems to be reliable for the moment and it does not require to kill and restart the daemon :) Claer