Hi, recently, a problem with OpenBSD has popped up over here that manifests itself in "random" connection failures after some time. Network diagram:
workstation (1) --- (3b) firewall (3a) --- Internet --- www.example.com (2) You surf from your workstation to www.example.com. On the firewall, you can see packets flowing, on the exterior interface. (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) and so on. Everything works just fine. Now, with nothing changed except for the firewall being up some days (currently: 13 days), and having pushed some traffic already, connections start to fail: On (3a), you see "almost" the same packet sequence like shown above, shortened for brevity: (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) <- point where the connection fails (2) -> (1) (2) -> (1) (2) -> (1) (2) -> (1) but on (3b), you see: (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) (2) -> (1) (1) -> (2) and then nothing more, like if the web server on the other side had stopped sending packets. I can't see the packets on pflog0, either, and using slightly different networking to "bypass" the firewall, everything works still fine, but "fixing" the problem involves powering down the firewall. Simply rebooting it w/o powering it down, does not fix the problem. It doesn't really matter which site "www.example.com" is (it starts for several sites at once, anyway), and, over time, the problem affects ever more sites until the firewall is hardly usable at all. But s1.wp.com is usually amongst the first sites to fail. This problem first occurred for us with 4.6-stable on both i386 and amd64, and now also occurred on -current with kernel 448 on i386. I'm underway trying to get yet-more-recent stuff installed to see whether the problem is fixed. The experience of the problem being "fixed" by a thorough power-cycle suggests that there may be some underlying memory corruption problem. I'd very much appreciate hints for how to go about debugging this, and/or can probably be remote controlled to do some testing. TIA! Kind regards, --Toni++