Why did it take almost a year to find this?

Or is this bug due to ioctl(2) becoming UNLOCKED on 2020/02/22?

> On Fri, Apr 10, 2020 at 01:22:32PM -0600, Theo de Raadt wrote:
> > I would like an explanation of the root cause.  What commit broke this?
> 
> The root cause for these asserts is that bridge_ioctl() drops NET_LOCK().
> 
> 
> net/if_bridge.c revision 1.332
> date: 2019/05/12 19:53:22;  author: mpi;  state: Exp;  lines: +92 -71;  
> commitid: sO9kfjHuGizxTwI1;
> Switch the list of span interfaces and interfaces to SMR.
> 
> This removes the KERNEL_LOCK() around the list iteration in bridge_enqueue().
> 
> Since the NET_LOCK() isn't protecting any data structure, release it early
> in all the code paths coming from the Network Stack to prevent possible
> deadlock situations with smr_barrier().
> 
> bridge_input() is still KERNEL_LOCK()ed as well as bridge_filterrule().
> 
> ok visa@
> 
> 
> > Visa Hankala <v...@openbsd.org> wrote:
> > 
> > > On Fri, Apr 10, 2020 at 10:16:47AM -0400, David Hill wrote:
> > > > On a April 9, 2020 cvs built machine:
> > > > 
> > > > An splassert shows when I add or delete vlan0 on bridge0
> > > > 
> > > > # ifconfig bridge0 add vlan0
> > > > splassert: vlan_ioctl: want 2 have 0
> > > > Starting stack trace...
> > > > vlan_ioctl(ffff800000936800,80206910,ffff80003316f318) at 
> > > > vlan_ioctl+0x65
> > > > ifpromisc(ffff800000936800,1) at ifpromisc+0xbb
> > > > bridge_ioctl(ffff8000009da000,8060693c,ffff80003316f520) at
> > > > bridge_ioctl+0x8c8
> > > > ifioctl(fffffd811e7e5e40,8060693c,ffff80003316f520,ffff800032fd49e8) at
> > > > ifioctl+0xa03
> > > > soo_ioctl(fffffd81015768e8,8060693c,ffff80003316f520,ffff800032fd49e8) 
> > > > at
> > > > soo_ioctl+0x171
> > > > sys_ioctl(ffff800032fd49e8,ffff80003316f630,ffff80003316f690) at
> > > > sys_ioctl+0x2df
> > > > syscall(ffff80003316f700) at syscall+0x389
> > > > Xsyscall() at Xsyscall+0x128
> > > > end of kernel
> > > > end trace frame: 0x7f7ffffdcef0, count: 249
> > > > End of stack trace.
> > > > 
> > > > # ifconfig bridge0 del vlan0
> > > > splassert: vlan_ioctl: want 2 have 0
> > > > Starting stack trace...
> > > > vlan_ioctl(ffff800000936800,80206910,ffff80003316ee78) at 
> > > > vlan_ioctl+0x65
> > > > ifpromisc(ffff800000936800,0) at ifpromisc+0xbb
> > > > bridge_ifremove(ffff8000009ed900) at bridge_ifremove+0xa4
> > > > bridge_ioctl(ffff8000009da000,8060693d,ffff80003316f0c0) at
> > > > bridge_ioctl+0x91e
> > > > ifioctl(fffffd811e7e5e40,8060693d,ffff80003316f0c0,ffff800032fd49e8) at
> > > > ifioctl+0xa03
> > > > soo_ioctl(fffffd81015768e8,8060693d,ffff80003316f0c0,ffff800032fd49e8) 
> > > > at
> > > > soo_ioctl+0x171
> > > > sys_ioctl(ffff800032fd49e8,ffff80003316f1d0,ffff80003316f230) at
> > > > sys_ioctl+0x2df
> > > > syscall(ffff80003316f2a0) at syscall+0x389
> > > > Xsyscall() at Xsyscall+0x128
> > > > end of kernel
> > > > end trace frame: 0x7f7fffff6a20, count: 248
> > > 
> > > Acquiring NET_LOCK() when calling ifpromisc() seems to help.
> > > This is similar to what bpf(4) does.
> > > 
> > > It looks that besides bridge(4), there are other places in the kernel
> > > where ifpromisc() is called without NET_LOCK().
> > > 
> > > Index: net/if_bridge.c
> > > ===================================================================
> > > RCS file: src/sys/net/if_bridge.c,v
> > > retrieving revision 1.338
> > > diff -u -p -r1.338 if_bridge.c
> > > --- net/if_bridge.c       6 Nov 2019 03:51:26 -0000       1.338
> > > +++ net/if_bridge.c       10 Apr 2020 19:05:05 -0000
> > > @@ -313,7 +313,9 @@ bridge_ioctl(struct ifnet *ifp, u_long c
> > >                   break;
> > >           }
> > >  
> > > +         NET_LOCK();
> > >           error = ifpromisc(ifs, 1);
> > > +         NET_UNLOCK();
> > >           if (error != 0) {
> > >                   free(bif, M_DEVBUF, sizeof(*bif));
> > >                   break;
> > > @@ -558,7 +560,9 @@ bridge_ifremove(struct bridge_iflist *bi
> > >   }
> > >  
> > >   bif->ifp->if_bridgeidx = 0;
> > > + NET_LOCK();
> > >   error = ifpromisc(bif->ifp, 0);
> > > + NET_UNLOCK();
> > >  
> > >   bridge_rtdelete(sc, bif->ifp, 0);
> > >   bridge_flushrule(bif);
> > > 

Reply via email to