Hello, This may be a pf issue, this may be an OpenBSD issue or this may be a client issue, so let me apologize in advance.
The setup is fairly simply -- a debian machine hanging off of each of two interfaces on an OpenBSD -current box from 11/8 running pf. Nothing particularly complex about this setup. Tight ruleset, aggressive optimization, nothing funky, no queueing, no per-rule timeout optimization, and the following options: set limit { states 500000, frags 100000 } set optimization aggressive set block-policy return set state-policy if-bound set require-order yes set skip on $LOCAL_IF set debug urgent set loginterface $FW_BACK_IF scrub all no-df random-id fragment reassemble Yes, there is lots more to this ruleset but nothing is getting blocked here and like I said, there are no rules in the remainder of the ruleset that have anything related to timeouts, limits or the like. In fact, I actually added a specific rule as a test case that does this: pass in on $CLIENT_IF inet proto tcp from $CLIENT_NET to $SERVER_NET \ port 12345 flags S/SA modulate state I've also tried this same test on other pf installations (early 3.7) that are vastly simpler and they behave identically. There are no other rules relating to port 12345 and if I remove this rule, the traffic gets blocked thanks to my default policy. My test is simple. While on $CLIENT_NET: while (true); do lynx -dump http://host.on.server.net:12345; date; done Things spin up fast and go quickly for some number of seconds spewing tens/hundreds of connections and then subsequent connections hang -- the client sits in SYN_SENT and the server sits there with several hundred connections in TIME_WAIT. Exactly 45 seconds later, things come back to life. In the time 0s to time 45s, you can see the TIME_WAITs slowly disappear, and then at 45s the loop comes back to life and the connections rip through once again. Some number of seconds later, things freeze again and hang for 45s. Both systems seem completely usable -- I can ssh/to from them and do whatever I please. I/O is doing almost nothing, system is 98% idle, and interrupts seem fine. Same with the firewall -- not even breaking a sweat. Not that it really matters, but the way this problem originally popped up was with SOAP calls from a Java client to a JBoss server. I've simplified the problem a bit by just using lynx on the client and Apache on the server. I do not know of any default settings in pf that would cause this. My second thought was the clients, but the problem non existent when the firewall is out of the picture. I've twisted some sysctl knobs that Henning and others have suggested in the past but none seem to have any effect. The only thing so far that has seemed to affect the timeout was changing pf's tcp.finwait. When I changed that from aggressive's setting of 30s to 10s, the timeouts went from a consistent 45s to 21s. Aggressive has that name for a reason so I'm hesitant to crank things any further in that regard. Any input, whether its pf, OpenBSD or client related would be much appreciated. Thanks! -jon