On Sun, Jul 24 2011 at 27:21, David Gwynne wrote:
> On 24/07/2011, at 8:27 PM, Jonathan Lassoff wrote:
> 
> > On Wed, Apr 20, 2011 at 7:10 AM, David Gwynne <l...@animata.net> wrote:
> >>
> >> On 20/04/2011, at 11:08 PM, Jonathan Lassoff wrote:
> >>
> >>> On Wed, Apr 20, 2011 at 4:22 AM, David Gwynne <l...@animata.net> wrote:
> >>>> you might be able to upgrade your passive firewall to 4.9 next to the
> active 4.7 one. it looks like the protocol stayed the same so they should be
> able to talk to each other.
> >>>
> >>> This would seem to be the case.
> >>>
> >>> This (http://undeadly.org/cgi?action=article&sid=20090301211402) is an
> >>> absolutely excellent bit of writing about the improvements to pfsync,
> >>> BTW. Thanks for letting that be shared.
> >>>
> >>>> however, it looks like bulk updates were broken in 4.7, which would
> explain your failover problems. you can work around that by going "pfctl -S
> /dev/stdout | ssh activefw pfctl -L /dev/stdin" as root on the passive fw.
> >>>
> >>> As an initial seeding of state? It seems to me that only some of my
> >>> flows get affected when failing over (not everything is reset and
> >>> traffic can still flow).
> >>
> >> yes. the pfctl commands will do a bulk update since the in kernel
> implementation was unreliable back then.
> >>
> >>> It appears that both firewalls have an approximately congruent set of
> >>> states, but usually a "pfctl -ss | wc -l" can be off by several
> >>> hundred, to several thousand states at times. My hunch is that state
> >>> creation and counter updates are not updated synchronously, so when
> >>> failing over there are still some updates in-flight, and for flows
> >>> that are moving their sequence numbers at a decent clip I could see
> >>> why they might get reset.
> >>
> >> pf has a bit of fuzz when it does its tcp window matching, so packets can
> get ahead of the firewall and be ok.
> >
> > Do you know if there is a way to see how much this fuzz is or if
> > there's an offset?
> 
> from memory its 1000 bytes.
> 
> > If dropped for being out of a window, will (or can) it get logged to pflog?
> 
> again, from memory its just dropped.
> 
> >> i wrote defer, so yes...
> >>
> >> on my boxes the increase in latency is about .2 to .3ms. if a firewall is
> missing its peer(s) it will go up to about 1/100th of a second.
> >
> > So does defer wait for a peer to acknowledge a new state just at the
> > time of creation, or does it include state updates about sequence
> > numbers as well?
> 
> defer only delays the first packet.
> 
> > I suspect I'm hitting a similar issue as you were with long-lived
> > flows getting reset at failover.
> 
> i think my problem is that i run both firewalls with the carp demotion counter
> set low. when a box is rebooted the carp default is at 0 or 1, which means it
> takes over traffic before it gets all the states. later code in rc.local
> demotes it, but by that time some packets have been eaten by the new box. i
> should fix it, but im lazy.
> 
> >> thats exactly how i have my stuff configured.
> >
> > Have you ever had trouble when re-numbering an interface? It seems to
> > me like ospfd doesn't pick up changes in interface numbering if
> > changed out from under it. Most other OSPF daemons I use would pick
> > this up as it changes, but as far I as can tell there's no way to tell
> > ospfd to reload interface addressing.
> 
> interfaces and addresses moving around hurts me too.
> 
> > I'm often needing to add more and more interfaces and ospf interfaces,
> > necessitating failing over so as to make it safe to kill and re-start
> > ospfd -- in the process it just seems to nip some flows from flowing.
> 
> i do that too. lets annoy claudio together!

In my world, it happends to change interface numbering. The solution we 
found is 
- remove the interface from ospfd.conf,
- reload configuration with ospfctl reload
- destroy the interface (our ospf interfaces are mainly gif ones),
- recreate interface with new IPs
- add conf to ospfd.conf,
- reload configuration with ospfctl reload

This may sound a bit too much, but it works and seems to be reliable for 
the moment and it does not require to kill and restart the daemon :)

Claer

Reply via email to