Why did it take almost a year to find this? Or is this bug due to ioctl(2) becoming UNLOCKED on 2020/02/22?
> On Fri, Apr 10, 2020 at 01:22:32PM -0600, Theo de Raadt wrote: > > I would like an explanation of the root cause. What commit broke this? > > The root cause for these asserts is that bridge_ioctl() drops NET_LOCK(). > > > net/if_bridge.c revision 1.332 > date: 2019/05/12 19:53:22; author: mpi; state: Exp; lines: +92 -71; > commitid: sO9kfjHuGizxTwI1; > Switch the list of span interfaces and interfaces to SMR. > > This removes the KERNEL_LOCK() around the list iteration in bridge_enqueue(). > > Since the NET_LOCK() isn't protecting any data structure, release it early > in all the code paths coming from the Network Stack to prevent possible > deadlock situations with smr_barrier(). > > bridge_input() is still KERNEL_LOCK()ed as well as bridge_filterrule(). > > ok visa@ > > > > Visa Hankala <v...@openbsd.org> wrote: > > > > > On Fri, Apr 10, 2020 at 10:16:47AM -0400, David Hill wrote: > > > > On a April 9, 2020 cvs built machine: > > > > > > > > An splassert shows when I add or delete vlan0 on bridge0 > > > > > > > > # ifconfig bridge0 add vlan0 > > > > splassert: vlan_ioctl: want 2 have 0 > > > > Starting stack trace... > > > > vlan_ioctl(ffff800000936800,80206910,ffff80003316f318) at > > > > vlan_ioctl+0x65 > > > > ifpromisc(ffff800000936800,1) at ifpromisc+0xbb > > > > bridge_ioctl(ffff8000009da000,8060693c,ffff80003316f520) at > > > > bridge_ioctl+0x8c8 > > > > ifioctl(fffffd811e7e5e40,8060693c,ffff80003316f520,ffff800032fd49e8) at > > > > ifioctl+0xa03 > > > > soo_ioctl(fffffd81015768e8,8060693c,ffff80003316f520,ffff800032fd49e8) > > > > at > > > > soo_ioctl+0x171 > > > > sys_ioctl(ffff800032fd49e8,ffff80003316f630,ffff80003316f690) at > > > > sys_ioctl+0x2df > > > > syscall(ffff80003316f700) at syscall+0x389 > > > > Xsyscall() at Xsyscall+0x128 > > > > end of kernel > > > > end trace frame: 0x7f7ffffdcef0, count: 249 > > > > End of stack trace. > > > > > > > > # ifconfig bridge0 del vlan0 > > > > splassert: vlan_ioctl: want 2 have 0 > > > > Starting stack trace... > > > > vlan_ioctl(ffff800000936800,80206910,ffff80003316ee78) at > > > > vlan_ioctl+0x65 > > > > ifpromisc(ffff800000936800,0) at ifpromisc+0xbb > > > > bridge_ifremove(ffff8000009ed900) at bridge_ifremove+0xa4 > > > > bridge_ioctl(ffff8000009da000,8060693d,ffff80003316f0c0) at > > > > bridge_ioctl+0x91e > > > > ifioctl(fffffd811e7e5e40,8060693d,ffff80003316f0c0,ffff800032fd49e8) at > > > > ifioctl+0xa03 > > > > soo_ioctl(fffffd81015768e8,8060693d,ffff80003316f0c0,ffff800032fd49e8) > > > > at > > > > soo_ioctl+0x171 > > > > sys_ioctl(ffff800032fd49e8,ffff80003316f1d0,ffff80003316f230) at > > > > sys_ioctl+0x2df > > > > syscall(ffff80003316f2a0) at syscall+0x389 > > > > Xsyscall() at Xsyscall+0x128 > > > > end of kernel > > > > end trace frame: 0x7f7fffff6a20, count: 248 > > > > > > Acquiring NET_LOCK() when calling ifpromisc() seems to help. > > > This is similar to what bpf(4) does. > > > > > > It looks that besides bridge(4), there are other places in the kernel > > > where ifpromisc() is called without NET_LOCK(). > > > > > > Index: net/if_bridge.c > > > =================================================================== > > > RCS file: src/sys/net/if_bridge.c,v > > > retrieving revision 1.338 > > > diff -u -p -r1.338 if_bridge.c > > > --- net/if_bridge.c 6 Nov 2019 03:51:26 -0000 1.338 > > > +++ net/if_bridge.c 10 Apr 2020 19:05:05 -0000 > > > @@ -313,7 +313,9 @@ bridge_ioctl(struct ifnet *ifp, u_long c > > > break; > > > } > > > > > > + NET_LOCK(); > > > error = ifpromisc(ifs, 1); > > > + NET_UNLOCK(); > > > if (error != 0) { > > > free(bif, M_DEVBUF, sizeof(*bif)); > > > break; > > > @@ -558,7 +560,9 @@ bridge_ifremove(struct bridge_iflist *bi > > > } > > > > > > bif->ifp->if_bridgeidx = 0; > > > + NET_LOCK(); > > > error = ifpromisc(bif->ifp, 0); > > > + NET_UNLOCK(); > > > > > > bridge_rtdelete(sc, bif->ifp, 0); > > > bridge_flushrule(bif); > > >