Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread Visa Hankala
On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> claudio and i came up with the following, which is to have tun_dev_open
> check the state of the vnode associated with the current open call
> after all the sleeping and potential tun_clone_destroy and
> tun_clone_create calls. if the vnode has been made bad/dead after
> all the sleeping, it returns with ENXIO.
> 
> this appears to fix the issue. i had a test program that would trigger
> the KASSERT in a matter of seconds and it's been running for an hour
> now.
> 
> here's the diff.

The diff looks reasonable.

OK visa@

I guess a similar issue can arise with some other detachable devices
as well. However, the problem is most acute with things like tun(4)
that can be created and destroyed programmatically on a whim at high
speed.

spec_open() could check for vnode revocation after completing d_open.
Of course, that would not fix the problem with tun. The check would
allow error reporting to happen earlier, but there would be no
fundamental shift in behaviour, I think.

> Index: if_tun.c
> ===
> RCS file: /cvs/src/sys/net/if_tun.c,v
> retrieving revision 1.234
> diff -u -p -r1.234 if_tun.c
> --- if_tun.c  16 Feb 2022 02:22:39 -  1.234
> +++ if_tun.c  24 Feb 2022 08:08:38 -
> @@ -374,10 +374,19 @@ tun_dev_open(dev_t dev, const struct if_
>   struct ifnet *ifp;
>   int error;
>   u_short stayup = 0;
> + struct vnode *vp;
>  
>   char name[IFNAMSIZ];
>   unsigned int rdomain;
>  
> + /*
> +  * Find the vnode associated with this open before we sleep
> +  * and let something else revoke it. Our caller has a reference
> +  * to it so we don't need to account for it.
> +  */
> + if (!vfinddev(dev, VCHR, &vp))
> + panic("%s vfinddev failed", __func__);
> +
>   snprintf(name, sizeof(name), "%s%u", ifc->ifc_name, minor(dev));
>   rdomain = rtable_l2(p->p_p->ps_rtableid);
>  
> @@ -412,6 +421,12 @@ tun_dev_open(dev_t dev, const struct if_
>   /* XXX if_clone_destroy if stayup? */
>   goto done;
>   }
> + }
> +
> + /* Has tun_clone_destroy torn the rug out under us? */
> + if (vp->v_type == VBAD) {
> + error = ENXIO;
> + goto done;
>   }
>  
>   if (sc->sc_dev != 0) {
> 



Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread Claudio Jeker
On Thu, Feb 24, 2022 at 08:56:59PM +1000, David Gwynne wrote:
> On Thu, Feb 24, 2022 at 11:13:48AM +0100, Claudio Jeker wrote:
> > On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> > > 
> > > here's the diff.
> > > 
> > > Index: if_tun.c
> > > ===
> > > RCS file: /cvs/src/sys/net/if_tun.c,v
> > > retrieving revision 1.234
> > > diff -u -p -r1.234 if_tun.c
> > > --- if_tun.c  16 Feb 2022 02:22:39 -  1.234
> > > +++ if_tun.c  24 Feb 2022 08:08:38 -
> > > @@ -374,10 +374,19 @@ tun_dev_open(dev_t dev, const struct if_
> > >   struct ifnet *ifp;
> > >   int error;
> > >   u_short stayup = 0;
> > > + struct vnode *vp;
> > >  
> > 
> > Why is there this empty line? It was there before but still wondering.
> 
> feng shui? laziness? i'll fix it later.
> 
> > >   char name[IFNAMSIZ];
> > >   unsigned int rdomain;
> > >  
> > > + /*
> > > +  * Find the vnode associated with this open before we sleep
> > > +  * and let something else revoke it. Our caller has a reference
> > > +  * to it so we don't need to account for it.
> > > +  */
> > > + if (!vfinddev(dev, VCHR, &vp))
> > > + panic("%s vfinddev failed", __func__);
> > > +
> > >   snprintf(name, sizeof(name), "%s%u", ifc->ifc_name, minor(dev));
> > >   rdomain = rtable_l2(p->p_p->ps_rtableid);
> > >  
> > > @@ -412,6 +421,12 @@ tun_dev_open(dev_t dev, const struct if_
> > >   /* XXX if_clone_destroy if stayup? */
> > >   goto done;
> > >   }
> > > + }
> > > +
> > > + /* Has tun_clone_destroy torn the rug out under us? */
> > > + if (vp->v_type == VBAD) {
> > > + error = ENXIO;
> > > + goto done;
> > >   }
> > >  
> > >   if (sc->sc_dev != 0) {
> > > 
> > 
> > OK claudio@
> > 
> > After sleeping over this I think this is the cleanest and simplest way
> > around this problem. A bit ugly that tun needs to peek into the vnode.
> > 
> > Another option is to split the clone destroy from the softc / device node.
> > Remove the VOP_REVOKE and actually allow tun to be destroyed and recreated
> > while open and in that case the device remains open and only the network
> > bits are destroyed and later recreated. So in your example from above:
> > 
> > > > - ifconfig tun0 create
> > > > - open /dev/tun0 -> fd 3
> > > > - ifconfig tun0 destroy
> > > > - ifconfig tun0 create
> > > > - write to fd 3
> > 
> > The write would be perfectly fine since the destroy did not distroy this
> > connection (only close(2) would do that). Actually a call to open
> > /dev/tun0 after the 2nd create would fail because the device is still
> > open. Doing this seems a lot more complex.
> 
> we talked about this a lot around the time of src/sys/net/if_tun.c
> r1.210. there seemed to be more weight on the side of the argument for
> VOP_REVOKE than against, and i still think that's the case now. tun
> going away and coming back in between open and write could go either
> way, but what about these:
> 
> - open() /dev/tun0, tun0 is destroyed, write() to tun0
> 
> should the write error? if we run with "only close can destroy the
> connection" does this mean the write will create tun0 again? in which
> rdomain should it be?

Normally, yes you would generate an error like EPIPE in such a case.
You may even need to send SIGPIPE but that is one of those nasty details.
 
> - open() /dev/tun0, tun0 is destroyed, read() from tun0
> 
> same as above?

Idealy you would signal EOF with a return of 0 (again similar to how pipes
behave).
 
> - begin blocking read of tun0, tun0 is destroyed, let's go shopping!
> 
> should the read wake up and return an error, or does it just block?

Again return 0.
 
> - poll on tun0, tun0 is destroyed
> 
> same as above?

poll should signal POLLHUP | POLLIN (iirc that's what pipes do).
 
> if we leave the /dev side of things operational if the interface goes
> away, then this would be inconsistent with something workign with bpf on
> the same interfaces. wouldnt this be inconsistent with hotplug devices
> and their /dev things?

Maybe, on the other hand it is how pipes, socketpairs and unix sockets
behave. As I said going down this road is possible but much more complex
because all these edgecases need to be defined. Also there may be bugs
triggered in userland apps with a tun(4) behaving like this.
 
> maybe destroy should be blocked if the /dev entry is open?
> 
> i dunno. it feels more natural that detaching or destroying an interface
> should push userland off it.

I think one can argue about benefits from either mode. The big problem is
that because of how if_clone locking works you can not block in the
destroy function or people will become very unhappy. You may be even able
to cause deadlocks.

As I said above I think your diff is the best solution to fix this race and
while implementing this without VOP_REVOKE is possible it is too complex.
I'm perfectly fine with the current behaviour and think it actually fits
b

Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread David Gwynne
On Thu, Feb 24, 2022 at 11:13:48AM +0100, Claudio Jeker wrote:
> On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> > 
> > here's the diff.
> > 
> > Index: if_tun.c
> > ===
> > RCS file: /cvs/src/sys/net/if_tun.c,v
> > retrieving revision 1.234
> > diff -u -p -r1.234 if_tun.c
> > --- if_tun.c16 Feb 2022 02:22:39 -  1.234
> > +++ if_tun.c24 Feb 2022 08:08:38 -
> > @@ -374,10 +374,19 @@ tun_dev_open(dev_t dev, const struct if_
> > struct ifnet *ifp;
> > int error;
> > u_short stayup = 0;
> > +   struct vnode *vp;
> >  
> 
> Why is there this empty line? It was there before but still wondering.

feng shui? laziness? i'll fix it later.

> > char name[IFNAMSIZ];
> > unsigned int rdomain;
> >  
> > +   /*
> > +* Find the vnode associated with this open before we sleep
> > +* and let something else revoke it. Our caller has a reference
> > +* to it so we don't need to account for it.
> > +*/
> > +   if (!vfinddev(dev, VCHR, &vp))
> > +   panic("%s vfinddev failed", __func__);
> > +
> > snprintf(name, sizeof(name), "%s%u", ifc->ifc_name, minor(dev));
> > rdomain = rtable_l2(p->p_p->ps_rtableid);
> >  
> > @@ -412,6 +421,12 @@ tun_dev_open(dev_t dev, const struct if_
> > /* XXX if_clone_destroy if stayup? */
> > goto done;
> > }
> > +   }
> > +
> > +   /* Has tun_clone_destroy torn the rug out under us? */
> > +   if (vp->v_type == VBAD) {
> > +   error = ENXIO;
> > +   goto done;
> > }
> >  
> > if (sc->sc_dev != 0) {
> > 
> 
> OK claudio@
> 
> After sleeping over this I think this is the cleanest and simplest way
> around this problem. A bit ugly that tun needs to peek into the vnode.
> 
> Another option is to split the clone destroy from the softc / device node.
> Remove the VOP_REVOKE and actually allow tun to be destroyed and recreated
> while open and in that case the device remains open and only the network
> bits are destroyed and later recreated. So in your example from above:
> 
> > > - ifconfig tun0 create
> > > - open /dev/tun0 -> fd 3
> > > - ifconfig tun0 destroy
> > > - ifconfig tun0 create
> > > - write to fd 3
> 
> The write would be perfectly fine since the destroy did not distroy this
> connection (only close(2) would do that). Actually a call to open
> /dev/tun0 after the 2nd create would fail because the device is still
> open. Doing this seems a lot more complex.

we talked about this a lot around the time of src/sys/net/if_tun.c
r1.210. there seemed to be more weight on the side of the argument for
VOP_REVOKE than against, and i still think that's the case now. tun
going away and coming back in between open and write could go either
way, but what about these:

- open() /dev/tun0, tun0 is destroyed, write() to tun0

should the write error? if we run with "only close can destroy the
connection" does this mean the write will create tun0 again? in which
rdomain should it be?

- open() /dev/tun0, tun0 is destroyed, read() from tun0

same as above?

- begin blocking read of tun0, tun0 is destroyed, let's go shopping!

should the read wake up and return an error, or does it just block?

- poll on tun0, tun0 is destroyed

same as above?

if we leave the /dev side of things operational if the interface goes
away, then this would be inconsistent with something workign with bpf on
the same interfaces. wouldnt this be inconsistent with hotplug devices
and their /dev things?

maybe destroy should be blocked if the /dev entry is open?

i dunno. it feels more natural that detaching or destroying an interface
should push userland off it.



Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread Claudio Jeker
On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> On Mon, Feb 21, 2022 at 03:00:01PM +1000, David Gwynne wrote:
> > On Sun, Feb 20, 2022 at 10:30:22AM +1000, David Gwynne wrote:
> > > 
> > > 
> > > > On 20 Feb 2022, at 09:46, David Gwynne  wrote:
> > > > 
> > > > On Sat, Feb 19, 2022 at 02:58:08PM -0800, Greg Steuck wrote:
> > > >> There's no reproducer, but maybe this race is approachable without one?
> > > >> 
> > > >> dev = sc->sc_dev;
> > > >> if (dev) {
> > > >> struct vnode *vp;
> > > >> 
> > > >> if (vfinddev(dev, VCHR, &vp))
> > > >> VOP_REVOKE(vp, REVOKEALL);
> > > >> 
> > > >> KASSERT(sc->sc_dev == 0);
> > > >> }
> > > > 
> > > > this was my last run at it:
> > > > https://marc.info/?l=openbsd-tech&m=164489981621957&w=2
> > > > 
> > > > maybe i need another dvthing thread to push it a bit harder...
> > > 
> > > adding another dvthing thread or two made it blow up pretty quickly :(
> > 
> > so it is this section:
> > 
> > /* kick userland off the device */
> > dev = sc->sc_dev;
> > if (dev) {
> > struct vnode *vp;
> > 
> > if (vfinddev(dev, VCHR, &vp))
> > VOP_REVOKE(vp, REVOKEALL);
> > 
> > KASSERT(sc->sc_dev == 0);
> > }
> > 
> > my assumption was/is that VOP_REVOKE would call tunclose (or tapclose)
> > on the currently open tun (or tap) device, and swap the vfs backend out
> > behind the scenes with deadfs.
> > 
> > the context behind this is that isnt really a strong binding between an
> > open /dev entry (eg, /dev/tun0) and an instance of an interface (eg,
> > tun0). all the device entrypoints (eg, tunopen, tunread, tunwrite,
> > etc) pass a dev_t, and that's used to look up an interface instance to
> > work with.
> > 
> > the problem with this is an interface could be destroyed and recreated
> > in between calls to the device entrypoints. ie, you could do the
> > following high level steps:
> > 
> > - ifconfig tun0 create
> > - open /dev/tun0 -> fd 3
> > - ifconfig tun0 destroy
> > - ifconfig tun0 create
> > - write to fd 3
> > 
> > and that would send a packet on the newly created tun0 because it had
> > the same minor number as the previous one.
> > 
> > there was a lot of consensus that this was Not The Best(tm), and that if
> > a tun interface was destroyed while the /dev entry was open, it should
> > act like the interface was detached and the open /dev entry should stop
> > working. this is what VOP_REVOKE helps with. or it is supposed to help
> > with.
> > 
> > when we create a tun interface, it's added to a global list of tun
> > interfaces. when a tun device is opened, we look for the interface on
> > that list (and create it if it doesnt exist), and then check to see if
> > it is already open by looking at sc_dev. if sc_dev is 0, it's not open
> > and tunopen can set sc_dev to claim ownership of it. if sc_dev is
> > non-zero, then the device is busy and open fails.
> > 
> > tunclose clears sc_dev to say ownership is given up.
> > 
> > tun destroy checks sc_dev, and if it is != 0 then it knows something has
> > it open and will call VOP_REVOKE. VOP_REVOKE is supposed to do what i
> > described above, which is call tunclose on the programs behalf and swap
> > the vfs ops out.
> > 
> > what i'm seeing is that sometimes VOP_REVOKE gets called, it happily
> > returns 0, but tunclose is not called. this means sc_dev is not cleared,
> > and then this KASSERT fires.
> > 
> > ive tried changing it something like this in the destroy path:
> > 
> > -   KASSERT(sc->sc_dev == 0);
> > +   while (sc->sc_dev != 0)
> > +   tsleep_nsec(&sc->sc_dev, PWAIT, "tunclose", INFSLP);
> > 
> > with tunclose calling wakeup(&sc->dev) when it's finished, but this ends
> > up getting stuck in the tsleep.
> > 
> > however, if i cut the KASSERT out and let destroy keep going, i do see
> > tunclose get called against the now non-existent interface. this would
> > be fine, but now we're back where we started. if someone recreates tun0
> > after it's been destroyed, tunclose will find the new interface and try
> > to close it.
> > 
> > ive tried to follow what VOP_REVOKE actually does and how it finds
> > tunclose to call it, but it's pretty twisty and i got tired.
> > 
> > i guess my question at this point is are my assumptions about how
> > VOP_REVOKE works valid? specifically, should it reliably be calling
> > tunclose? if not, what causes tunclose to be called in the future and
> > why can't i wait for it in tun_clone_destroy?
> 
> claudio figured it out. his clue was that multiple concurrent calls
> to tunopen (or tapopen) will share a vnode. because tunopen can sleep,
> multiple programs can be inside tunopen for the same tun interface at
> the same time, all with references against the same vnode.
> 
> at the same time as this another thread/program can call VOP_REVOKE
> via tun_clone_destroy (eg, ifconfig tun1 destroy does this).
> VOP_REVOKE marks a vnode as ba

Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread David Gwynne
On Mon, Feb 21, 2022 at 03:00:01PM +1000, David Gwynne wrote:
> On Sun, Feb 20, 2022 at 10:30:22AM +1000, David Gwynne wrote:
> > 
> > 
> > > On 20 Feb 2022, at 09:46, David Gwynne  wrote:
> > > 
> > > On Sat, Feb 19, 2022 at 02:58:08PM -0800, Greg Steuck wrote:
> > >> There's no reproducer, but maybe this race is approachable without one?
> > >> 
> > >> dev = sc->sc_dev;
> > >> if (dev) {
> > >> struct vnode *vp;
> > >> 
> > >> if (vfinddev(dev, VCHR, &vp))
> > >> VOP_REVOKE(vp, REVOKEALL);
> > >> 
> > >> KASSERT(sc->sc_dev == 0);
> > >> }
> > > 
> > > this was my last run at it:
> > > https://marc.info/?l=openbsd-tech&m=164489981621957&w=2
> > > 
> > > maybe i need another dvthing thread to push it a bit harder...
> > 
> > adding another dvthing thread or two made it blow up pretty quickly :(
> 
> so it is this section:
> 
> /* kick userland off the device */
> dev = sc->sc_dev;
> if (dev) {
> struct vnode *vp;
> 
> if (vfinddev(dev, VCHR, &vp))
> VOP_REVOKE(vp, REVOKEALL);
> 
> KASSERT(sc->sc_dev == 0);
> }
> 
> my assumption was/is that VOP_REVOKE would call tunclose (or tapclose)
> on the currently open tun (or tap) device, and swap the vfs backend out
> behind the scenes with deadfs.
> 
> the context behind this is that isnt really a strong binding between an
> open /dev entry (eg, /dev/tun0) and an instance of an interface (eg,
> tun0). all the device entrypoints (eg, tunopen, tunread, tunwrite,
> etc) pass a dev_t, and that's used to look up an interface instance to
> work with.
> 
> the problem with this is an interface could be destroyed and recreated
> in between calls to the device entrypoints. ie, you could do the
> following high level steps:
> 
> - ifconfig tun0 create
> - open /dev/tun0 -> fd 3
> - ifconfig tun0 destroy
> - ifconfig tun0 create
> - write to fd 3
> 
> and that would send a packet on the newly created tun0 because it had
> the same minor number as the previous one.
> 
> there was a lot of consensus that this was Not The Best(tm), and that if
> a tun interface was destroyed while the /dev entry was open, it should
> act like the interface was detached and the open /dev entry should stop
> working. this is what VOP_REVOKE helps with. or it is supposed to help
> with.
> 
> when we create a tun interface, it's added to a global list of tun
> interfaces. when a tun device is opened, we look for the interface on
> that list (and create it if it doesnt exist), and then check to see if
> it is already open by looking at sc_dev. if sc_dev is 0, it's not open
> and tunopen can set sc_dev to claim ownership of it. if sc_dev is
> non-zero, then the device is busy and open fails.
> 
> tunclose clears sc_dev to say ownership is given up.
> 
> tun destroy checks sc_dev, and if it is != 0 then it knows something has
> it open and will call VOP_REVOKE. VOP_REVOKE is supposed to do what i
> described above, which is call tunclose on the programs behalf and swap
> the vfs ops out.
> 
> what i'm seeing is that sometimes VOP_REVOKE gets called, it happily
> returns 0, but tunclose is not called. this means sc_dev is not cleared,
> and then this KASSERT fires.
> 
> ive tried changing it something like this in the destroy path:
> 
> - KASSERT(sc->sc_dev == 0);
> + while (sc->sc_dev != 0)
> + tsleep_nsec(&sc->sc_dev, PWAIT, "tunclose", INFSLP);
> 
> with tunclose calling wakeup(&sc->dev) when it's finished, but this ends
> up getting stuck in the tsleep.
> 
> however, if i cut the KASSERT out and let destroy keep going, i do see
> tunclose get called against the now non-existent interface. this would
> be fine, but now we're back where we started. if someone recreates tun0
> after it's been destroyed, tunclose will find the new interface and try
> to close it.
> 
> ive tried to follow what VOP_REVOKE actually does and how it finds
> tunclose to call it, but it's pretty twisty and i got tired.
> 
> i guess my question at this point is are my assumptions about how
> VOP_REVOKE works valid? specifically, should it reliably be calling
> tunclose? if not, what causes tunclose to be called in the future and
> why can't i wait for it in tun_clone_destroy?

claudio figured it out. his clue was that multiple concurrent calls
to tunopen (or tapopen) will share a vnode. because tunopen can sleep,
multiple programs can be inside tunopen for the same tun interface at
the same time, all with references against the same vnode.

at the same time as this another thread/program can call VOP_REVOKE
via tun_clone_destroy (eg, ifconfig tun1 destroy does this).
VOP_REVOKE marks a vnode as bad, which in turn means that subsequent
open()s of a tun interface will get a brand new vnode.

so multiple threads holding references to a vnode can be sleeping in
tun_dev_open on the interface cloner lock. one thread wins and takes
ownership of the tun interface, then another thread can d