On 2018-02-25 21:54, Alexander Bluhm wrote:
On Sat, Feb 24, 2018 at 04:20:50PM -0500, Johan Huldtgren wrote:
trying to connect to my gateway today I found the following
panic. This is 100% reproducible anytime I connect via
openvpn and then generate traffic. This first happened on
the Feb 7th snap, I updated and it happens on the latest
snap as well.
The questions is what have you used before. I have rewritten the
code at 2017/12/29. Was your working version before that?
Looking back at my logs it looks like the last time I used it was
January 20th, and the snap I had then was:
OpenBSD 6.2-current (GENERIC) #316: Sat Dec 23 11:39:17 MST 2017
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
Although a bit different, a simmilar assertion was there before.
ip_output(d2574200,0,f5275000,1,0,0,0) at ip_output+0x649
ip_forward(d2574200,d24b8000,d1f8aed8,0) at ip_forward+0x20a
ddb> show mbuf
mbuf 0xd05032f4
When you type "show mbuf" you must give the address of the mbuf.
The functions ip_output() and ip_forward() pass it at the first
argument. So "show mbuf 0xd2574200" would produces reasonable
results, your command dumped arbitrary memory.
thanks, noted for the future.
panic: kernel diagnostic assertion "skrev->reverse == NULL" failed:
file
"/usr/src/sys/net/pf.c", line 7277
pf_find_state(d2486f00,f528d8e0,2,d251b900) at pf_find_state+0x28d
This functions calls pf_state_key_link_reverse(sk, pkt_sk) with
pkt_sk->reverse != NULL.
On the way to that call we went through
pkt_sk = m->m_pkthdr.pf.statekey;
if (pkt_sk && pf_state_key_isvalid(pkt_sk->reverse))
sk = pkt_sk->reverse;
We know that pkt_sk != NULL and pkt_sk->reverse != NULL and before
doing the RB_FIND() lookup we check that sk == NULL. So
pf_state_key_isvalid(pkt_sk->reverse) must be false.
The kernel tried to use an invalid statekey. How can that happen?
Invalid means sk->removed == 1, but pf_state_key_detach() also calls
pf_state_key_unlink_reverse().
pf_test(2,3,d247d400,f528da64) at pf_test+0xb63
ip_output(d251b900,0,f528dad0,1,0,0,0) at ip_output+0x649
Here we find the outgoing state. The mbuf had a statekey before.
ip_forward(d251b900,d247d400,d201cb48,0) at ip_forward+0x20a
ip_input_if(f528dc64,f528dc50,4,0,d247d400) at ip_input_if+0x48e
ipv4_input(d247d400,d251b900) at ipv4_input+0x2b
Here pf attaches the incoming statekey to the mbuf. This is the
one with the invalid reverse.
tun_dev_write(d247d400,f528dd98,10001) at tun_dev_write+0x222
tunwrite(2800,f528dd98,11) at tunwrite+0x53
How does you pf config look like? Do you have some skip on tun?
The only thing I have skip on is lo
Was there unencrpyted traffic before you enabled openvpn?
I assume so, this is my firewall there is always some traffic
but tun0 is only used for openvpn
Were there matching pf states before you enabled openvpn?
I'm not sure exactly sure what you're asking, as this host
does more than openvpn there would have been other states,
not sure if I can find that now though?
Does it immediately crash when you start openvpn and the first
packet is sent out.
It seems so I can connect and if I then (for example) ping a
host on the inside, I get one (most I've ever seen is three)
ping back and then nothing. At that point I can also see that
the host has failed over to it's carp partner.
Do you only use the tun interface and the outgoing
interface?
No, here is the output of ifconfig, there are several
interfaces / vlans,
# ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768
index 6 priority 0 llprio 3
groups: lo
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
inet 127.0.0.1 netmask 0xff000000
vr0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 00:00:24:c9:58:4c
index 1 priority 0 llprio 3
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 172.16.0.2 netmask 0xffffff00 broadcast 172.16.0.255
vr1: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 00:00:24:c9:58:4d
index 2 priority 0 llprio 3
media: Ethernet autoselect (100baseTX full-duplex)
status: active
vr2: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 00:00:24:c9:58:4e
index 3 priority 0 llprio 3
media: Ethernet autoselect (100baseTX full-duplex)
status: active
vr3: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 00:00:24:c9:58:4f
index 4 priority 0 llprio 3
media: Ethernet autoselect (100baseTX full-duplex)
status: active
enc0: flags=0<>
index 5 priority 0 llprio 3
groups: enc
status: active
ral0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:12:0e:61:7f:b0
index 7 priority 4 llprio 3
groups: wlan
media: IEEE802.11 autoselect
status: no network
ieee80211: nwid ""
carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:00:5e:00:01:01
index 8 priority 15 llprio 3
carp: MASTER carpdev vr0 vhid 1 advbase 1 advskew 0 carppeer
172.16.0.3
groups: carp
status: master
inet 172.16.0.103 netmask 0xffffff00 broadcast 172.16.0.255
carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:00:5e:00:01:02
index 9 priority 15 llprio 3
carp: MASTER carpdev vlan20 vhid 2 advbase 1 advskew 0 carppeer
192.168.100.3
groups: carp
status: master
inet 192.168.100.103 netmask 0xffffff00 broadcast
192.168.100.255
carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:00:5e:00:01:03
index 10 priority 15 llprio 3
carp: MASTER carpdev vlan30 vhid 3 advbase 1 advskew 0 carppeer
192.168.0.3
groups: carp
status: master
inet 192.168.0.103 netmask 0xffffff00 broadcast 192.168.0.255
carp3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:00:5e:00:01:04
index 11 priority 15 llprio 3
carp: MASTER carpdev vlan666 vhid 4 advbase 1 advskew 0 carppeer
10.66.66.3
groups: carp
status: master
inet 10.66.66.103 netmask 0xffffff00 broadcast 10.66.66.255
pflow0: flags=41<UP,RUNNING> mtu 1448
index 12 priority 0 llprio 3
pflow: sender: 192.168.100.2 receiver: 192.168.100.8:9995
version: 10
groups: pflow
pfsync0: flags=41<UP,RUNNING> mtu 1500
index 13 priority 0 llprio 3
pfsync: syncdev: vlan666 syncpeer: 10.66.66.3 maxupd: 128 defer:
off
groups: carp pfsync
tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
index 14 priority 0 llprio 3
groups: tun
status: active
inet 10.6.6.1 --> 10.6.6.2 netmask 0xffffffff
vlan20: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
1500
lladdr 00:00:24:c9:58:4d
index 15 priority 0 llprio 3
encap: vnetid 20 parent vr1
groups: vlan
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 192.168.100.2 netmask 0xffffff00 broadcast 192.168.100.255
vlan30: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
1500
lladdr 00:00:24:c9:58:4e
index 16 priority 0 llprio 3
encap: vnetid 30 parent vr2
groups: vlan
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.0.255
vlan666: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
1500
lladdr 00:00:24:c9:58:4f
index 17 priority 0 llprio 3
encap: vnetid 666 parent vr3
groups: vlan
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 10.66.66.2 netmask 0xffffff00 broadcast 10.66.66.255
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33172
index 18 priority 0 llprio 3
groups: pflog
Do you have more forwarding paths?
I don't know, how would I check?
Do you use floating states?
pf.conf(5) says this is the default and I've not
changed it.
Does the problem go away with if-bound states?
No, I set this in pf.conf
set state-policy if-bound
and tried and it still paniced.
Is there more stuff involved like gif(4) or bridge(4) or ....
neither gif nor bridge, but vlan and carp.
Although I do not fully understand what is the root of the problem,
you can try this diff. Does it prevent the panic? Do you see the
log message I have added there? This would at least prove that my
theory is correct.
I traveling (which is why I was trying to use my vpn), unfortunately
my internet here is atrocious, just reading and writing email is
very hard. Once I get somewhere with better connections I'll get this
tested and report back.
thanks,
.jh
bluhm
Index: net/pf.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/net/pf.c,v
retrieving revision 1.1061
diff -u -p -r1.1061 pf.c
--- net/pf.c 18 Feb 2018 21:45:30 -0000 1.1061
+++ net/pf.c 26 Feb 2018 00:27:57 -0000
@@ -1070,8 +1070,20 @@ pf_find_state(struct pfi_kif *kif, struc
pkt_sk = NULL;
}
- if (pkt_sk && pf_state_key_isvalid(pkt_sk->reverse))
- sk = pkt_sk->reverse;
+ if (pkt_sk) {
+ if (pf_state_key_isvalid(pkt_sk->reverse)) {
+ sk = pkt_sk->reverse;
+ } else if (pkt_sk->reverse != NULL) {
+ log(LOG_ERR,
+ "pf: state key reverse invalid. "
+ "pkt_sk=%p, pkt_sk->reverse=%p, "
+ "pkt_sk->reverse->reverse=%p\n",
+ pkt_sk, pkt_sk->reverse,
+ pkt_sk->reverse->reverse);
+ pf_mbuf_unlink_state_key(m);
+ pkt_sk = NULL;
+ }
+ }
if (pkt_sk == NULL) {
/* here we deal with local outbound packet */