On 10/17/2012 8:51 AM, Bennett Samowich wrote:
I just had an event that I'm having trouble identifying the root cause.
I'm hoping that someone might have encountered this or might be able to
point me toward some things to check.
Yesterday we had an event where our primary firewall would stop passing
traffic. The only thing short of a reboot that would restore service was
to run 'sh /etc/netstart pfsync0'. Resetting pfsync's physical interface
or pulling that cable didn't produce results. Only resetting the pfsync0
virtual interface would restore service. I'm not even sure what
information would be helpful to provide or what other questions to ask. I
also found it odd that the two servers did not show the same number of
state entries by a difference of anywhere from 100 to 1000s. Is this
typical?
Thanks,
Bennett
States come and go so depending on the amount of traffic going through
the router, it could be off by a few hundred, or maybe even a few
thousand if you do a lot of traffic.
I just counted the states (at the exact same time, several times) on
some primary/backup CARP routers using pfsync that push a constant
10-20mbit to several thousand web clients at any given moment, and the
states were within about 150 of each other consistently. I would say
being off by 1000s is indicative of a problem, but if you push a lot of
traffic, it might not be.
Anyway, you need to post: a full ifconfig, dmesg, and look through
/var/log/messages for anything interesting from CARP or pfsync to get
started.
Also put your pfsync cabling through a cable tester just to double check
it. I've had a bad pfsync interface cable cause weird problems before.
Any errors on the interface? netstat -in will tell you about errors, not
ifconfig it seems.