> > > The receive worker walks gsm->dlci[] without gsm->mutex while a
> > > concurrent GSMIOC_SETCONF -> gsm_cleanup_mux() frees the DLCIs, so the
> > > control handlers can dereference a freed gsm_dlci. v1's NULL check only
> > > narrowed the window; v2 fixes the use-after-free itself.
> > >
> > > The fix pins each DLCI the dispatch dereferences with its existing
> > > tty_port reference (option 2), so the data path stays lock-free. See the
> > > patch 1 commit message for details, including why the late destructor
> > > uses cmpxchg() so it cannot wipe a re-created mux (Daniel's teardown
> > > concern).
> > >
> > > Changes since v1:
> > >  - Fix the UAF by reference-pinning instead of a NULL check in the
> > >    handlers; no gsm->mutex in the data path (Greg, Daniel).
> > >  - Pin every DLCI the dispatch touches, not just the addressed one:
> > >    MSC/RLS/PN operate on gsm->dlci[k] named in the payload.
> > >  - Add a base selftest (patch 2), as Greg asked.
> > >
> > > Verification (KASAN, panic_on_warn=1): the originally reported splat is
> > > the gsm_control_reply() / CMD_TEST path (see the Link in patch 1). A
> > > reproducer targeting the MSC handler crashes the unpatched kernel and
> > > survives 270 race rounds on v2. The selftest passes on both the clean
> > > and patched kernel (pass:3 fail:0 skip:0).
> > >
> > > Weiming Shi (2):
> > >   tty: n_gsm: fix use-after-free in gsm_queue() control frame dispatch
> > >   selftests: tty: add base regression test for n_gsm line discipline
> >
> > So now there is a race on device node level. We have the old virtual
> > gsmttys still waiting for the current pending operations to finish while
> > the reconfiguration of the ldisc triggers a recreation of these. How is
> > this handled?
> >
> A pending gsmtty operation stays bound to the old object. gsmtty_install()
> takes a dlci_get() and the gsmtty ops use tty->driver_data, not gsm->dlci[],
> so an open fd keeps its own reference and never reaches the recreated dlci
> (a fresh gsm_dlci_alloc()). On teardown gsm_dlci_release() does
> tty_vhangup(), which unblocks the pending ops, and sets DLCI_CLOSED, which
> every gsmtty op checks first.
> 
> This is the existing tty_port refcount, and the receive path here uses the
> same dlci_get/put, so I don't think the pin adds a device-node race. But
> you know this hardware better than I do; if there's a path I'm missing,
> point me at it.

Sounds right. Thank you for the explanation. My tty internals understanding
is rather shallow.

> > PATCH 1/2:
> >
> > > -     if (addr == 0 || addr >= NUM_DLCI || gsm->dlci[addr] == NULL)
> > > +     if (addr == 0)
> > >               return;
> > > -     dlci = gsm->dlci[addr];
> >
> > Control octets can be transmitted on non-control DLCIs in convergence layer
> > type 2. You do not guard against invalid DLCIs anymore. Same for parameter
> > negotiation and RLS.
> >
> 
> The checks aren't dropped, just folded into gsm_dlci_open_get(), which
> returns NULL for addr >= NUM_DLCI and for an empty slot. But hiding them
> in the lookup makes the diff read like the guard is gone, which was a bad
> call. For v4, maybe we should keep the addr checks at the call sites and
> et the helper only do the locked lookup and pin, so the guard stays
> visible where the DLCI is used? Does that match what you'd expect?

Let's revisit the code for gsm_control_modem for example:
        len = gsm_read_ea_val(&addr, data, cl);
        if (len < 1)
                return;

        addr >>= 1;
        /* Closed port, or invalid ? */
        if (addr == 0 || addr >= NUM_DLCI || gsm->dlci[addr] == NULL)
                return;
        dlci = gsm->dlci[addr];
Without the addr >= NUM_DLCI part you access the dlci array at a possible
out of bounds index if the received data was broken. Let us keep input data
validation at system boundaries. You also risk dereferencing a null
pointer if the requested DLCI has not been allocated, i.e. opened, yet.
Hence, both checks should be kept in place.

Best regards,
Daniel Starke

Reply via email to