On 12/14/2007 02:17:22 AM, Henrik Johansen wrote:
Hi list,
We are experiencing a steady flow of BAD state error messages that I
cannot explain.
I continue to have problems with (Microsoft) hosts that
violate the 2MSL TCP rule (STD7, RFC793, page 27
"Knowing When to Keep Quiet"). I strongly suspect
that MS is setting the MSL to 1 minute rather than the
2 minutes of the standard. (I don't know about Vista,
which supposedly has a new TCP stack.) This causes
pf to see state errors.
It would, *urp*, be nice if pf had a way to
specify the MSL in the scrub directive to
work around the brokenness. I've had to replace
a lot of stateful rules with stateless filtering.
Not only is it ugly and less secure that way, but
diagnosing the problem is a real pain in the butt.
I can't say if this is your problem. I ran the
following script against your log after I did a
random check and saw a 1 minute interval during
which a particular sourceip/sourceport/destip/destport
was failing. The script (kinda) does a frequency analysis
on how long the bad state persists on any particular
connection. The results tell me nothing. Maybe
somebody else will have better luck.
--------------------<snip>-------------------
#!/bin/sh
export IN=/tmp/messages.sanitized
cat $IN \
| grep ' BAD state: ' \
| cut -d ' ' -f 11-12 \
| sort -u \
| while read conn ; do
#echo
#echo $conn
cat $IN \
| grep "$conn" \
| awk 'BEGIN {fst = ""; };
{if (fst == "")
fst = $3;
lst = $3;
#print $0;
};
END {
#print fst, lst;
gsub(":", " ", fst);
gsub(":", " ", lst);
fststmp = mktime("2007 12 13 " fst);
lststmp = mktime("2007 12 13 " lst);
print lststmp - fststmp;
};'
done \
| sort -n \
| awk '# Do frequency analysis
BEGIN {print "Seconds Count";
l = "";}
{if (l != $0) {
if (l != "" ) {
print l, c;
};
c = 0;
l = $0;
};
c = c + 1;
}
END {print l, c;}
'
Karl <[EMAIL PROTECTED]>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein