On Wednesday 22 August 2007, Bill Marquette wrote:
> For the last two days I've been troubleshooting a wierd issue where my
> secondary firewall in a pfsync/carp cluster isn't maintaining a state
> table similar in size to the primary - it's slowly increasing to the
> max size.  I think I've finally tracked it down to ip_output()
> returning an error, but at this point I'm lost.  The interfaces show
> no errors, this box happily ran OpenBSD for the last three years with
> no similar errors and has only started exhibiting this behavior after
> converting it.  I'm seeing this on multiple boxes, but am spending my
> time troubleshooting just one.  Any advice/assistance would be greatly
> appreciated, I'm at a loss and this is affecting my production
> environment.
>
> We're running RELENG_6_2, nics are Intel PRO/1000's (copper, but the
> cat-5e cable is a direct run to the 6513 switch one cabinet over -
> 15ft cable).
>
> This is a netstat from the primary machine, the secondary has been
> failed over to a couple times and looks similar (although
> interestingly the cluster seems to handle being on the secondary box
> better)
> # netstat -s -p pfsync
> pfsync:
>         409302985 packets received (IPv4)
>         0 packets received (IPv6)
>                 0 packets discarded for bad interface
>                 0 packets discarded for bad ttl
>                 0 packets shorter than header
>                 0 packets discarded for bad version
>                 0 packets discarded for bad HMAC
>                 0 packets discarded for bad action
>                 0 packets discarded for short packet
>                 0 states discarded for bad values
>                 0 stale states
>                 16980281 failed state lookup/inserts
>         1541416698 packets sent (IPv4)
>         0 packets sent (IPv6)
>                 0 send failed due to mbuf memory error
>                 182754275 send error

There are two reasons why we increase the send error counter.  Either the 
internal deferred work queue is full or ip_output fails.  Could you 
locate "pfsyncstats.pfsyncs_oerrors++" in your source code and replace 
either occurrence with a printf().  Maybe use the attached.  This way we 
will know what exactly fails and if it is ip_output, why.

> # netstat -i -Iem2
> Name    Mtu Network       Address              Ipkts Ierrs    Opkts
> Oerrs  Coll em2    1500 <Link#3>      00:04:23:a6:b7:be 409328713    27
> 1359271127 0     0
> em2    1500 192.168.100.2 l4dupfw140-sync   409327567     - 1359270884
>     -     -



-- 
/"\  Best regards,                      | [EMAIL PROTECTED]
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
Index: if_pfsync.c
===================================================================
RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/if_pfsync.c,v
retrieving revision 1.19.2.5
diff -u -r1.19.2.5 if_pfsync.c
--- if_pfsync.c	19 Jan 2007 23:01:26 -0000	1.19.2.5
+++ if_pfsync.c	22 Aug 2007 22:05:04 -0000
@@ -1842,13 +1842,14 @@
 {
 	struct pfsync_softc *sc = (struct pfsync_softc *)arg;
 	struct mbuf *m;
+	int error;
 
 	for(;;) {
 		IF_DEQUEUE(&sc->sc_ifq, m);
 		if (m == NULL)
 			break;
-		if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL))
-			pfsyncstats.pfsyncs_oerrors++;
+		if ((error = ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL)))
+			printf("pfsync_senddef: ip_output %d\n", error);
 	}
 }
 

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to