I'm running OpenBSD 6.0 i386 on a Soekris as my local firewall. We had/have a
problem with network dropouts on our NBN satellite connection which I believe
I've traced to the firewall's ARP entry for the upstream gateway expiring.
The problem appears to be that once the ARP entry expires, the firewall does
not issue an ARP who-was request to renew the entry. As a consequence packets
can't be forwarded to the gateway and it looks like an ISP outage. This state
persists for periods of up to 10 minutes.
During the "outage" DHCP on the ISP link works (presumably because that doesn't
involve the arp table) but pings to the gateway do not, and nor does any other
normal IP traffic which requires using the gateway.
I left a tcpdump running for the gateway host IP and noticed this morning that
immediately after an ARP request occurred and was answered (immediately) that
traffic commenced working again, which led to to pursuing this.
I don't understand why, since the gateway address doesn't have a current ARP
entry, the firewall does not imemdiately issue an ARP request for it. Even a
ping directly from the firewall to the gateway address does not cause an ARP
request.
In case it is relevant, all the through traffic is directed via PF nat-to
rules, but I suspect this isn't related because direct ping traffic from the
firewall also doesn't work. On the other hand, there's a secondary interface to
a 3G modem which doesn't do this, and traffic through that interface is not
NATed because the 3G modem does it.
Finally, I've done the following to verify the issue:
Waited for the ARP entry to expire, and saw throughput cease and direct pings
of the gateway from the firewall fail:
ping 172.16.20.254
PING 172.16.20.254 (172.16.20.254): 56 data bytes
ping: sendto: Host is down
ping: wrote 172.16.20.254 64 chars, ret=-1
ping: sendto: Host is down
ping: wrote 172.16.20.254 64 chars, ret=-1
I added the ARP entry by hand with the arp command and throughput and pings
resumed immediately.
I've manually removed the ARP entry and seem identical symptoms, and I've
manually added a static ARP entry for the gateway and the connection has been
solid for several hours now. Versus "outages" every hour, if not more
frequently.
I would like to understand this behaviour and to know if it is, as it appears,
a bug.
Cheers,
Cameron Simpson <c...@cskk.id.au>