I've been looking at code, and I saw something that scared me.
Lots of drivers call mac_unregister(), and *then* disable callbacks,
interrupts, etc.
This makes sense, because you don't want to tear all that stuff down if
the mac is going to refuse to detach because its busy.
But fact that mac_unregister *also* frees the mac handle creates a nasty
use-after-free race condition.
You can't close this race with a lock, because that would mean holding a
driver lock across mac_unregister(), which I think would be a big no-no,
and lead to deadlock.
The two solutions I see are:
1) separate mac_unregister() from from mac_handle_free() (or somesuch).
The act of unregistering should mark the the handle "unregistered", and
mac entry points should see that and refuse to do anything on a
mac_notify_xxx or other upcall from the driver with that handle.
2) mac drivers all need to tear-down their resources *before* calling
mac_unregister, and then re-establish them if mac_unregister succeeds.
This is a lot of code, and unfortunately it introduces new points of
failure.
I'm strongly in favor of #1. IMO, unregistration should be separate
from resource allocation. I'm actually not a fan of the way the
mac_register() and mac_alloc() work either. If I had my druthers, what
would happen is:
1) driver calls mac_alloc(), and gets a "mostly" opaque mac_handle.
A versioned callbacks vector can be supplied as part of mac_alloc().
The handle is marked "unregistered" so that the Mac layer will ignore
upcalls from the driver on it.
2) driver does other device specific initializations....
3) driver calls mac_register(). This clears the "unregistered" flag
on the handle, and sets things going.
4) sometime later, the driver calls mac_unregister() as part of
detach handling. If it fails, driver returns DDI_FAILURE back through
detach(9e). Otherwise, the framework marks the handle "unregistered".....
5) driver tears down its own resources, turning off interrupts,
cyclics, etc
6) driver calls mac_free() to finally deallocate the memory for the
mac structure.
What do other people think? Am I understanding the current race
condition(s) incorrectly? Assuming that I do not, how onerous would
people find making a change like the above? (Every NIC driver would
need to be touched, but the changes would be modest. I'd be willing to
do the work myself to fix it.)
Oh, I strongly feel that this change *must* be made (assuming that I'm
just not misunderstanding the race) before we open up the stability of
the Nemo layer.
- Garrett
_______________________________________________
networking-discuss mailing list
[email protected]