On Wednesday 22 August 2007, Bill Marquette wrote: > For the last two days I've been troubleshooting a wierd issue where my > secondary firewall in a pfsync/carp cluster isn't maintaining a state > table similar in size to the primary - it's slowly increasing to the > max size. I think I've finally tracked it down to ip_output() > returning an error, but at this point I'm lost. The interfaces show > no errors, this box happily ran OpenBSD for the last three years with > no similar errors and has only started exhibiting this behavior after > converting it. I'm seeing this on multiple boxes, but am spending my > time troubleshooting just one. Any advice/assistance would be greatly > appreciated, I'm at a loss and this is affecting my production > environment. > > We're running RELENG_6_2, nics are Intel PRO/1000's (copper, but the > cat-5e cable is a direct run to the 6513 switch one cabinet over - > 15ft cable). > > This is a netstat from the primary machine, the secondary has been > failed over to a couple times and looks similar (although > interestingly the cluster seems to handle being on the secondary box > better) > # netstat -s -p pfsync > pfsync: > 409302985 packets received (IPv4) > 0 packets received (IPv6) > 0 packets discarded for bad interface > 0 packets discarded for bad ttl > 0 packets shorter than header > 0 packets discarded for bad version > 0 packets discarded for bad HMAC > 0 packets discarded for bad action > 0 packets discarded for short packet > 0 states discarded for bad values > 0 stale states > 16980281 failed state lookup/inserts > 1541416698 packets sent (IPv4) > 0 packets sent (IPv6) > 0 send failed due to mbuf memory error > 182754275 send error
There are two reasons why we increase the send error counter. Either the internal deferred work queue is full or ip_output fails. Could you locate "pfsyncstats.pfsyncs_oerrors++" in your source code and replace either occurrence with a printf(). Maybe use the attached. This way we will know what exactly fails and if it is ip_output, why. > # netstat -i -Iem2 > Name Mtu Network Address Ipkts Ierrs Opkts > Oerrs Coll em2 1500 <Link#3> 00:04:23:a6:b7:be 409328713 27 > 1359271127 0 0 > em2 1500 192.168.100.2 l4dupfw140-sync 409327567 - 1359270884 > - - -- /"\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News
Index: if_pfsync.c
===================================================================
RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/if_pfsync.c,v
retrieving revision 1.19.2.5
diff -u -r1.19.2.5 if_pfsync.c
--- if_pfsync.c 19 Jan 2007 23:01:26 -0000 1.19.2.5
+++ if_pfsync.c 22 Aug 2007 22:05:04 -0000
@@ -1842,13 +1842,14 @@
{
struct pfsync_softc *sc = (struct pfsync_softc *)arg;
struct mbuf *m;
+ int error;
for(;;) {
IF_DEQUEUE(&sc->sc_ifq, m);
if (m == NULL)
break;
- if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL))
- pfsyncstats.pfsyncs_oerrors++;
+ if ((error = ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL)))
+ printf("pfsync_senddef: ip_output %d\n", error);
}
}
signature.asc
Description: This is a digitally signed message part.
