On 1/22/2016 3:35 PM, Nick Rogers wrote:
On Thu, Jan 21, 2016 at 11:44 AM, Matthew Grooms <mgro...@shrew.net> wrote:
# pfctl -si
Status: Enabled for 0 days 02:25:41 Debug: Urgent
State Table Total Rate
current entries 77759
searches 483831701 55352.0/s
inserts 825821 94.5/s
removals 748060 85.6/s
Counters
match 27118754 3102.5/s
bad-offset 0 0.0/s
fragment 0 0.0/s
short 0 0.0/s
normalize 0 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 0 0.0/s
ip-option 6655 0.8/s
proto-cksum 0 0.0/s
state-mismatch 0 0.0/s
state-insert 0 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 0 0.0/s
# pfctl -st
tcp.first 120s
tcp.opening 30s
tcp.established 86400s
tcp.closing 900s
tcp.finwait 45s
tcp.closed 90s
tcp.tsdiff 30s
udp.first 600s
udp.single 600s
udp.multiple 900s
icmp.first 20s
icmp.error 10s
other.first 60s
other.single 30s
other.multiple 60s
frag 30s
interval 10s
adaptive.start 90000 states
adaptive.end 120000 states
src.track 0s
I think there may be a problem with the code that calculates adaptive
timeout values that is making it way too aggressive. If by default it's
supposed to decrease linearly between %60 and %120 of the state table max,
I shouldn't be loosing TCP connections that are only idle for a few minutes
when the sate table is < %70 full. Unfortunately that appears to be the
case. At most this should have decreased the 86400s timeout by %17 to
72000s for established TCP connections.
That doesn't make sense to me either. Even if the math is off by a factor
of 10 the state should live for about 24 minutes.
I've tested this for a few hours now and all my idle SSH sessions have
been rock solid. If anyone else is scratching their head over a problem
like this, I would suggest disabling the adaptive timeout feature or
increasing it to a much higher value. Maybe one of the pf maintainers can
chime in and shed some light on why this is happening. If not, I'm going to
file a bug report as this certainly feels like one.
Did you go with making adaptive timeout less aggressive or disable it
entirely? I would think that if adaptive timeout is really that broken more
people would notice this problem, especially myself since I have many
servers running a very short tcp.established timeout, but the fact that you
are noticing this kind of weirdness has me concerned about how the adaptive
setting is affecting my environment.
I increased the value to 90K for the 10K limit. Yes, it's concerning.
Today I setup a test environment at about 1/10th the connections to see
if I could reproduce the issue on a smaller scale, but had no luck. I'm
trying to find a cmd line test program that will generate enough tcp
connections so I can reproduce it on a similar scale to my production
environment. So far I haven't found anything that will do the trick. I
may end up rolling my own. I'll reply back to the list if I can find a
way to reproduce this.
Thanks again,
-Matthew
_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"