Re: [panic] Race in IEEE802.11 layer towards device drivers
"FreeBSD and all other open source projects are the Tower of Babel in computer era." --me So, join me @ git://dev.nasreddine.com/run.git (or http://dev.nasreddine.com/gitweb/?p=run.git;a=summary ) Just work on any of *_dev branches. - Original Message > From: Hans Petter Selasky > To: freebsd-current@freebsd.org > Cc: PseudoCylon ; Sam Leffler ; >freebsd-...@freebsd.org > Sent: Wed, July 21, 2010 10:00:26 AM > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > Hi, > > Please confirm that this patch is working for you: > > http://p4web.freebsd.org/@@181261?ac=10 > > --HPS > Confirmed it's properly been updated. AK ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
Hi, Please confirm that this patch is working for you: http://p4web.freebsd.org/@@181261?ac=10 --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
- Original Message > From: Hans Petter Selasky > To: PseudoCylon > Cc: freebsd-current@freebsd.org; Sam Leffler ; >freebsd-...@freebsd.org > Sent: Tue, July 20, 2010 4:46:34 AM > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > On Tuesday 20 July 2010 12:03:22 PseudoCylon wrote: > > - Original Message > > > > > From: Hans Petter Selasky > > > To: freebsd-current@freebsd.org > > > Cc: PseudoCylon ; Sam Leffler ; > > > > > >freebsd-...@freebsd.org > > > > > > Sent: Mon, July 19, 2010 1:17:04 PM > > > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > > > > > Hi AK, > > > > > > I've committed your patches to USB P4. I've made some additional > > > patches. > > > > > > Can you check and verify everything? > > > > > > http://p4web.freebsd.org/@@181189?ac=10 > > > > Hi > > > > If we change sc->cmdq_run = RUN_CMDQ_ABORT, > > > > -- begin excerpt -- > > > > > > @@ -4890,7 +4877,10 @@ run_stop(void *arg) > > ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); > > > > sc->ratectl_run = RUN_RATECTL_OFF; > > -sc->cmdq_run = RUN_CMDQ_ABORT; > > + > > +RUN_CMDQ_LOCK(sc); > > +sc->cmdq_run = sc->cmdq_key_set = RUN_CMDQ_ABORT; > > +RUN_CMDQ_UNLOCK(sc); > > > > -- end excerpt -- > > > > > > we also need to change this, otherwise key will be cleared. > > Ok. > > Try to give the second mutex a different name, and see how many warnings go > away. > > --HPS > Giving different name makes all of "duplicate lock" warnings away. Here is the patch includes all changes -- begin patch -- diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c index 017e4b0..da22077 100644 --- a/dev/usb/wlan/if_run.c +++ b/dev/usb/wlan/if_run.c @@ -549,7 +549,7 @@ run_attach(device_t self) mtx_init(&sc->sc_mtx, device_get_nameunit(sc->sc_dev), MTX_NETWORK_LOCK, MTX_DEF); mtx_init(&sc->sc_cmdq_mtx, device_get_nameunit(sc->sc_dev), -MTX_NETWORK_LOCK, MTX_DEF); +"command queue", MTX_DEF); iface_index = RT2860_IFACE_INDEX; @@ -4670,8 +4670,6 @@ run_init_locked(struct run_softc *sc) if(ic->ic_nrunning > 1) return; -run_stop(sc); - for (ntries = 0; ntries < 100; ntries++) { if (run_read(sc, RT2860_ASIC_VER_ID, &tmp) != 0) goto fail; -- end patch -- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On Tuesday 20 July 2010 12:03:22 PseudoCylon wrote: > - Original Message > > > From: Hans Petter Selasky > > To: freebsd-current@freebsd.org > > Cc: PseudoCylon ; Sam Leffler ; > > > >freebsd-...@freebsd.org > > > > Sent: Mon, July 19, 2010 1:17:04 PM > > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > > > Hi AK, > > > > I've committed your patches to USB P4. I've made some additional > > patches. > > > > Can you check and verify everything? > > > > http://p4web.freebsd.org/@@181189?ac=10 > > Hi > > If we change sc->cmdq_run = RUN_CMDQ_ABORT, > > -- begin excerpt -- > > > @@ -4890,7 +4877,10 @@ run_stop(void *arg) > ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); > > sc->ratectl_run = RUN_RATECTL_OFF; > -sc->cmdq_run = RUN_CMDQ_ABORT; > + > +RUN_CMDQ_LOCK(sc); > +sc->cmdq_run = sc->cmdq_key_set = RUN_CMDQ_ABORT; > +RUN_CMDQ_UNLOCK(sc); > > -- end excerpt -- > > > we also need to change this, otherwise key will be cleared. Ok. Try to give the second mutex a different name, and see how many warnings go away. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
- Original Message > From: Hans Petter Selasky > To: freebsd-current@freebsd.org > Cc: PseudoCylon ; Sam Leffler ; >freebsd-...@freebsd.org > Sent: Mon, July 19, 2010 1:17:04 PM > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > Hi AK, > > I've committed your patches to USB P4. I've made some additional patches. > > Can you check and verify everything? > > http://p4web.freebsd.org/@@181189?ac=10 > Hi If we change sc->cmdq_run = RUN_CMDQ_ABORT, -- begin excerpt -- @@ -4890,7 +4877,10 @@ run_stop(void *arg) ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); sc->ratectl_run = RUN_RATECTL_OFF; -sc->cmdq_run = RUN_CMDQ_ABORT; + +RUN_CMDQ_LOCK(sc); +sc->cmdq_run = sc->cmdq_key_set = RUN_CMDQ_ABORT; +RUN_CMDQ_UNLOCK(sc); -- end excerpt -- we also need to change this, otherwise key will be cleared. -- begin patch -- diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c index 017e4b0..f7abe17 100644 --- a/dev/usb/wlan/if_run.c +++ b/dev/usb/wlan/if_run.c @@ -4670,8 +4670,6 @@ run_init_locked(struct run_softc *sc) if(ic->ic_nrunning > 1) return; -run_stop(sc); - for (ntries = 0; ntries < 100; ntries++) { if (run_read(sc, RT2860_ASIC_VER_ID, &tmp) != 0) goto fail; -- end patch -- > Also please compile a kernel with WITNESS enabled to catch any LOR's, hence > we > > introduced another mutex. > The 2nd mutex did solve a deadlock, but doesn't solve the LOR. -- begin message -- lock order reversal: 1st 0xff8000a257d0 run0_node_lock (run0_node_lock) @ /usr/src/sys/net80211/ieee80211_node.c:1736 2nd 0xff8000a19348 run0 (network driver) @ /mnt/share/home/AK/FreeBSD/modules/usb/run/../../../../mnt/dev/usb/wlan/if_run.c:2212 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x81e _mtx_lock_flags() at _mtx_lock_flags+0x78 run_key_delete() at run_key_delete+0x45 _ieee80211_crypto_delkey() at _ieee80211_crypto_delkey+0x9e ieee80211_crypto_delkey() at ieee80211_crypto_delkey+0x28 ieee80211_node_delucastkey() at ieee80211_node_delucastkey+0x78 ieee80211_sta_leave() at ieee80211_sta_leave+0x16 ieee80211_node_leave() at ieee80211_node_leave+0x11d hostap_recv_mgmt() at hostap_recv_mgmt+0x33f hostap_input() at hostap_input+0xc09 run_rx_frame() at run_rx_frame+0x13f run_bulk_rx_callback() at run_bulk_rx_callback+0x3b7 usbd_callback_wrapper() at usbd_callback_wrapper+0x12b usb_command_wrapper() at usb_command_wrapper+0x76 usb_callback_proc() at usb_callback_proc+0x76 usb_process() at usb_process+0xbb fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8029b4ed30, rbp = 0 --- -- end message -- or -- begin message -- lock order reversal: 1st 0xff8000a257d0 run0_node_lock (run0_node_lock) @ /usr/src/sys/net80211/ieee80211_node.c:1736 2nd 0xff8000a19348 run0 (network driver) @ /mnt/share/home/AK/FreeBSD/modules/usb/run/../../../../mnt/dev/usb/wlan/if_run.c:2212 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x81e _mtx_lock_flags() at _mtx_lock_flags+0x78 run_key_delete() at run_key_delete+0x47 _ieee80211_crypto_delkey() at _ieee80211_crypto_delkey+0x9e ieee80211_crypto_delkey() at ieee80211_crypto_delkey+0x28 ieee80211_node_delucastkey() at ieee80211_node_delucastkey+0x78 ieee80211_sta_leave() at ieee80211_sta_leave+0x16 ieee80211_node_leave() at ieee80211_node_leave+0x11d ieee80211_node_timeout() at ieee80211_node_timeout+0x1d5 softclock() at softclock+0x2a0 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop() at ithread_loop+0xb2 fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff852d30, rbp = 0 --- -- end message -- There are new warning, "acquiring duplicate lock." For example, -- begin message -- acquiring duplicate lock of same type: "network driver" 1st run0 @ /usr/src/sys/dev/usb/usb_request.c:691 2nd run0 @ /mnt/share/home/AK/FreeBSD/modules/usb/run/../../../../mnt/dev/usb/wlan/if_run.c:4831 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x8ef _mtx_lock_flags() at _mtx_lock_flags+0x78 run_init_locked() at run_init_locked+0x753 run_ioctl() at run_ioctl+0xad taskqueue_run() at taskqueue_run+0x91 taskqueue_thread_loop() at taskqueue_thread_loop+0x3f fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff803e5b2d30, rbp = 0 --- -- end message -- I don't know if it's worth patching or safe to patch (specially lock/unlock in
Re: [panic] Race in IEEE802.11 layer towards device drivers
Hi AK, I've committed your patches to USB P4. I've made some additional patches. Can you check and verify everything? http://p4web.freebsd.org/@@181189?ac=10 Also please compile a kernel with WITNESS enabled to catch any LOR's, hence we introduced another mutex. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
[NB] Obviously, I didn't click "reply ALL" last time, so here are missing part > > > > -if(vap->iv_opmode == IEEE80211_M_HOSTAP){ > > > > > > > > -RUN_LOCK(sc); > > > > +if (vap->iv_opmode == IEEE80211_M_HOSTAP) > > > > > > > > sc->cmdq_key_set = RUN_CMDQ_GO; > > > > > > > > -RUN_UNLOCK(sc); > > > > -} > > > > > > > > > > > > Why are you removing these locks? > > > > It is simple assignment, it must be atomic. > > Not necessarily. If you don't put lock statements around it or use the > volatile keyword, the compiler can re-organize the execution order. I will > have a look at it. > > > > > > Another question: > > > i = RUN_CMDQ_GET(&sc->cmdq_store); > > > DPRINTF("cmdq_store=%d\n", i); > > > > > >sc->cmdq[i].func = run_update_beacon_cb; > > >sc->cmdq[i].arg0 = vap; > > > > > > Why is this code and similar places not enclosed with mutexes? > > > > First, I couldn't use a lock in key_delete() because of LoR. So, I use > > atomic instead. RUN_CMDQ_GET is atomic_fetch_add(). Whatever executes that > > code gets unique place (cmdq[i]) to write, so there shouldn't be any race. > > > > Then out of order execution happened. Specially, when key_set() overtakes > > key_delete(), encryption fails. So, all deferred processes are called back > > via run_cmdq_cb() to maintain the order. Because cmdq functions are first > > written for key_delete() where lock causes LoR. So, lock isn't needed. > > > > run_cmdq_cb() uses lock. But it is for calling callback functions locked. > > So that, functions just call another function locked, like > > ##_callback() > > { > >LOCK(); > >##_locked(); > >UNLOCK(); > > } > > won't be needed. > > If the run_cmdq_cb() is running at the same time which you are queuing > elements, then I note that you set .func before .arg0. The ##_callback() code > only checks if .func has been set. Actually the "i" increment should be after > that you filled out the data, and then you see that you cannot use atomic. I > think the most simple solution is to add another mutex, sc->sc_cmdq_mtx, which > protects the queue and it's associated data. Also, what do you do if the queue > wraps around? You should have a mechanism to prevent that, because you then > might start executing commands in random order? > Here is a patch (patch against P4 if_run.c rev 14 and if_runvar.h rev 8) -- begin patch -- diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c index 8c96534..c988ad4 100644 --- a/dev/usb/wlan/if_run.c +++ b/dev/usb/wlan/if_run.c @@ -90,12 +90,6 @@ SYSCTL_INT(_hw_usb_run, OID_AUTO, debug, CTLFLAG_RW, &run_debug, 0, #define IEEE80211_HAS_ADDR4(wh) \ (((wh)->i_fc[1] & IEEE80211_FC1_DIR_MASK) == IEEE80211_FC1_DIR_DSTODS) -/* - * Because of LOR in run_key_delete(), use atomic instead. - * '& RUN_CMDQ_MASQ' is to loop cmdq[]. - */ -#define RUN_CMDQ_GET(c)(atomic_fetchadd_32((c), 1) & RUN_CMDQ_MASQ) - static const struct usb_device_id run_devs[] = { { USB_VP(USB_VENDOR_ABOCOM,USB_PRODUCT_ABOCOM_RT2770) }, { USB_VP(USB_VENDOR_ABOCOM,USB_PRODUCT_ABOCOM_RT2870) }, @@ -554,6 +548,8 @@ run_attach(device_t self) mtx_init(&sc->sc_mtx, device_get_nameunit(sc->sc_dev), MTX_NETWORK_LOCK, MTX_DEF); +mtx_init(&sc->sc_cmdq_mtx, device_get_nameunit(sc->sc_dev), +MTX_NETWORK_LOCK, MTX_DEF); iface_index = RT2860_IFACE_INDEX; @@ -737,6 +733,7 @@ run_detach(device_t self) } mtx_destroy(&sc->sc_mtx); +mtx_destroy(&sc->sc_cmdq_mtx); return (0); } @@ -830,9 +827,6 @@ run_vap_create(struct ieee80211com *ic, if(sc->rvp_cnt++ == 0) ic->ic_opmode = opmode; -if(opmode == IEEE80211_M_HOSTAP) -sc->cmdq_run = RUN_CMDQ_GO; - DPRINTF("rvp_id=%d bmap=%x rvp_cnt=%d\n", rvp->rvp_id, sc->rvp_bmap, sc->rvp_cnt); @@ -889,27 +883,31 @@ run_cmdq_cb(void *arg, int pending) struct run_softc *sc = arg; uint8_t i; -/* call cmdq[].func locked */ -RUN_LOCK(sc); -for(i = sc->cmdq_exec; sc->cmdq[i].func && pending; -i = sc->cmdq_exec, pending--){ +RUN_CMDQ_LOCK(sc); +for (i = sc->cmdq_exec; sc->cmdq[i].func; i = sc->cmdq_exec) { DPRINTFN(6, "cmdq_exec=%d pending=%d\n", i, pending); -if(sc->cmdq_run == RUN_CMDQ_GO){ +if (sc->cmdq_run == RUN_CMDQ_GO || +(sc->cmdq_key_set == RUN_CMDQ_GO && +sc->cmdq[i].func == run_key_set_cb)) { +RUN_CMDQ_UNLOCK(sc); +RUN_LOCK(sc); /* * If arg0 is NULL, callback func needs more * than one arg. So, pass ptr to cmdq struct. */ -if(sc->cmdq[i].arg0) +if (sc->cmdq[i].arg0) sc->cmdq[i].func(sc->cmdq[i].arg0); else sc->cmdq[i].func(&sc->cmdq[i]); +RUN_UNLOCK(sc); +RUN_CMDQ_LOCK(sc); } sc->cmdq[i].arg0 = NULL; sc->cmdq[i].func = NULL; sc->cmdq_exec++; sc->cmdq_exec &= RUN_CMDQ_MASQ; } -RUN_UNLOCK(sc); +RUN_CMDQ_UNLOCK(sc); } static void @@ -1771,6 +1769,19 @@ run_newstate(struct ieee80211vap *vap, enum ieee80211_state nstate, int arg) case IEEE80211_S_INIT: restart_ratectl = 1; +/* + * When hostapd has set a key, don't clear it. + * But, when the devic
Re: [panic] Race in IEEE802.11 layer towards device drivers
On Wednesday 14 July 2010 14:31:29 PseudoCylon wrote: > -if(vap->iv_opmode == IEEE80211_M_HOSTAP){ > -RUN_LOCK(sc); > +if (vap->iv_opmode == IEEE80211_M_HOSTAP) > sc->cmdq_key_set = RUN_CMDQ_GO; > -RUN_UNLOCK(sc); > -} > > Why are you removing these locks? Another question: i = RUN_CMDQ_GET(&sc->cmdq_store); DPRINTF("cmdq_store=%d\n", i); sc->cmdq[i].func = run_update_beacon_cb; sc->cmdq[i].arg0 = vap; Why is this code and similar places not enclosed with mutexes? --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On Wednesday 14 July 2010 14:31:29 PseudoCylon wrote: > -if(vap->iv_opmode == IEEE80211_M_HOSTAP){ > -RUN_LOCK(sc); > +if (vap->iv_opmode == IEEE80211_M_HOSTAP) > sc->cmdq_key_set = RUN_CMDQ_GO; > -RUN_UNLOCK(sc); > -} > Why are you removing these locks? --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
- Original Message > From: Hans Petter Selasky > To: freebsd-current@freebsd.org > Cc: Andrew Thompson ; Sam Leffler ; >PseudoCylon ; freebsd-...@freebsd.org > Sent: Mon, July 12, 2010 2:01:11 PM > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > Hi Andrew, > > Your patch appears to be working. Can you fix this issue in the other WLAN > drivers aswell? Then send an e-mail to request testing? I had a go at it here: > > http://p4web.freebsd.org/@@180844?ac=10 > Can this patch be included into testing? patch to P4 USB rev. 14 if_run.c If not, at least please delete an extra semicolon at the end of line 3191 (must be an merge bug). I missed it since it wans't in diff/patch. AK summary of changes * fixes bugs in rev 209144 a shared key was written properly only on first time init, but not subsequent init. Make sure the key is written all the time. * stop checking 'pending' in run_cmdq_cb(). When loop 'pending' times, new tasks enqueued while looping won't be executed because 'pending' passed from taskqueue function won't be incremented. -- patch begin -- diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c index 7a3952c..12b45ec 100644 --- a/dev/usb/wlan/if_run.c +++ b/dev/usb/wlan/if_run.c @@ -830,9 +830,6 @@ run_vap_create(struct ieee80211com *ic, if(sc->rvp_cnt++ == 0) ic->ic_opmode = opmode; -if(opmode == IEEE80211_M_HOSTAP) -sc->cmdq_run = RUN_CMDQ_GO; - DPRINTF("rvp_id=%d bmap=%x rvp_cnt=%d\n", rvp->rvp_id, sc->rvp_bmap, sc->rvp_cnt); @@ -891,15 +888,16 @@ run_cmdq_cb(void *arg, int pending) /* call cmdq[].func locked */ RUN_LOCK(sc); -for(i = sc->cmdq_exec; sc->cmdq[i].func && pending; -i = sc->cmdq_exec, pending--){ +for (i = sc->cmdq_exec; sc->cmdq[i].func; i = sc->cmdq_exec) { DPRINTFN(6, "cmdq_exec=%d pending=%d\n", i, pending); -if(sc->cmdq_run == RUN_CMDQ_GO){ +if (sc->cmdq_run == RUN_CMDQ_GO || +(sc->cmdq_key_set == RUN_CMDQ_GO && +sc->cmdq[i].func == run_key_set_cb)) { /* * If arg0 is NULL, callback func needs more * than one arg. So, pass ptr to cmdq struct. */ -if(sc->cmdq[i].arg0) +if (sc->cmdq[i].arg0) sc->cmdq[i].func(sc->cmdq[i].arg0); else sc->cmdq[i].func(&sc->cmdq[i]); @@ -1771,6 +1769,19 @@ run_newstate(struct ieee80211vap *vap, enum ieee80211_state nstate, int arg) case IEEE80211_S_INIT: restart_ratectl = 1; +/* + * When hostapd has set a key, don't clear it. + * But, when the device is being brought down, clear it. + */ +if (sc->cmdq_key_set != RUN_CMDQ_GO || +ostate == IEEE80211_S_RUN) { +/* clear shared key table */ +run_set_region_4(sc, +RT2860_SKEY(rvp->rvp_id, 0), 0, 4 * 32); +/* clear shared key mode */ +run_set_region_4(sc, RT2860_SKEY_MODE_0_7, 0, 4); +} + if (ostate != IEEE80211_S_RUN) break; @@ -2100,13 +2111,10 @@ run_key_set(struct ieee80211vap *vap, struct ieee80211_key *k, * To make sure key will be set when hostapd * calls iv_key_set() before if_init(). */ -if(vap->iv_opmode == IEEE80211_M_HOSTAP){ -RUN_LOCK(sc); +if (vap->iv_opmode == IEEE80211_M_HOSTAP) sc->cmdq_key_set = RUN_CMDQ_GO; -RUN_UNLOCK(sc); -} -return(1); +return (1); } /* @@ -3188,7 +3196,7 @@ run_sendprot(struct run_softc *sc, ackrate = ieee80211_ack_rate(ic->ic_rt, rate); isshort = (ic->ic_flags & IEEE80211_F_SHPREAMBLE) != 0; -dur = ieee80211_compute_duration(ic->ic_rt, pktlen, rate, isshort); +dur = ieee80211_compute_duration(ic->ic_rt, pktlen, rate, isshort) + ieee80211_ack_duration(ic->ic_rt, rate, isshort); wflags = RT2860_TX_FRAG; @@ -4693,14 +4701,6 @@ run_init_locked(struct run_softc *sc) /* clear WCID attribute table */ run_set_region_4(sc, RT2860_WCID_ATTR(0), 0, 8 * 32); -/* hostapd sets a key before init. So, don't clear it. */ -if(sc->cmdq_key_set != RUN_CMDQ_GO){ -/* clear shared key table */ -run_set_region_4(sc, RT2860_SKEY(0, 0), 0, 8 * 32); -/* clear shared key mode */ -run_set_region_4(sc, RT2860_SKEY_MODE_0_7, 0, 4); -} - run_read(sc, RT2860_US_CYC_CNT, &tmp); tmp = (tmp & ~0xff) | 0x1e; run_write(sc, RT2860_US_CYC_CNT, tmp); @@ -4807,7 +4807,7 @@ run_stop(void *arg) ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); sc->ratectl_run = RUN_RATECTL_OFF; -sc->cmdq_run = sc->cmdq_key_set; +sc->cmdq_run = RUN_CMDQ_ABORT; RUN_UNLOCK(sc); -- end patch -- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On Tuesday 13 July 2010 03:54:08 PseudoCylon wrote: > Should the debugging code, usb_pause_mtx(), be left in the code for > testing? If the drivers don't panic to begin with, we won't know the > patch really fixed the issue. > No, I think it is safe to remove it. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
- Original Message > From: Hans Petter Selasky > To: freebsd-current@freebsd.org > Cc: Andrew Thompson ; Sam Leffler ; >PseudoCylon ; freebsd-...@freebsd.org > Sent: Mon, July 12, 2010 2:01:11 PM > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > Hi Andrew, > > Your patch appears to be working. Can you fix this issue in the other WLAN > drivers aswell? Then send an e-mail to request testing? I had a go at it here: > > http://p4web.freebsd.org/@@180844?ac=10 > Since I wasn't able to reproduce the panic, I can only confirm the patched run(4) driver runs OK. I just want to patch hostap related fix before calling for this test. I'll ask the user if it's fixed. Should the debugging code, usb_pause_mtx(), be left in the code for testing? If the drivers don't panic to begin with, we won't know the patch really fixed the issue. AK > I found another panic issue: > > ifconfig wlan0 delete > ifconfig wlan0 destroy > > When not associate or associated. > > Backtrace (AMD64 - 9-current): > > node_free() + 0x2c > rum_tx_free() + 0x3b > which is called from the bulk tx callback > > Another thread is running an IOCTL -> rum_stop(), which causes the CANCELLED > event to be passed to USB. Can't we free any nodes at this point? > > --HPS > > > This turned out to be refcounting of the ieee80211_node struct which > > was causing this panic. vap->iv_bss can be freed at any time so all > > users of it need to bump the refcount to use it safely. > > > > This patch should fix the panic in the rum driver. > > http://people.freebsd.org/~thompsa/rum_node_refcnt.diff > > > > There are other places where it is still an issue such as the > > ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and > > of course all other ieee80211 drivers. > > > ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
Hi Andrew, Your patch appears to be working. Can you fix this issue in the other WLAN drivers aswell? Then send an e-mail to request testing? I had a go at it here: http://p4web.freebsd.org/@@180844?ac=10 I found another panic issue: ifconfig wlan0 delete ifconfig wlan0 destroy When not associate or associated. Backtrace (AMD64 - 9-current): node_free() + 0x2c rum_tx_free() + 0x3b which is called from the bulk tx callback Another thread is running an IOCTL -> rum_stop(), which causes the CANCELLED event to be passed to USB. Can't we free any nodes at this point? --HPS > This turned out to be refcounting of the ieee80211_node struct which > was causing this panic. vap->iv_bss can be freed at any time so all > users of it need to bump the refcount to use it safely. > > This patch should fix the panic in the rum driver. > http://people.freebsd.org/~thompsa/rum_node_refcnt.diff > > There are other places where it is still an issue such as the > ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and > of course all other ieee80211 drivers. > ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On 12 Jul 2010, at 01:07, Andrew Thompson wrote: > On 8 July 2010 07:13, Hans Petter Selasky wrote: >> Hi, >> >> When supplying wpa_supplicant.conf with incorrect passwords, but a valid >> SSID, >> I have seen kernel panics several times when using USB based WLAN dongles. >> When only supplying a valid password, no panic has been seen. >> >> How to reproduce: >> >> 1) configure invalid password >> 2) wpa_cli: reconfigure >> 3) configure valid password >> 4) wpa_cli: reconfigure >> 5) goto 1 >> >> The USB commands which are executed inside the newstate callback usually take >> very little time, but still not as little time as PCI read/writes. I've >> forced >> slower operation in the newstate callback, and can reproduce warning >> printouts >> from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to >> your >> USB code: >> >> http://p4web.freebsd.org/@@180604?ac=10 >> >> In my opinion the deferring of all states to a single task is wrong. There >> should be at least one task per possible state, and the queuing mechanism >> should follow the last-queued is last executed rule. This is not the case >> with >> the task-queue mechanism in the kernel. > > This turned out to be refcounting of the ieee80211_node struct which > was causing this panic. vap->iv_bss can be freed at any time so all > users of it need to bump the refcount to use it safely. > > This patch should fix the panic in the rum driver. > http://people.freebsd.org/~thompsa/rum_node_refcnt.diff > > There are other places where it is still an issue such as the > ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and > of course all other ieee80211 drivers. Oh, this makes sense now. My previous attempt at help you made no sense... Regards, -- Rui Paulo ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On Monday 12 July 2010 02:07:55 Andrew Thompson wrote: > This turned out to be refcounting of the ieee80211_node struct which > was causing this panic. vap->iv_bss can be freed at any time so all > users of it need to bump the refcount to use it safely. > > This patch should fix the panic in the rum driver. > http://people.freebsd.org/~thompsa/rum_node_refcnt.diff > > There are other places where it is still an issue such as the > ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and > of course all other ieee80211 drivers. > I will give your patch a try later today. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On 8 July 2010 07:13, Hans Petter Selasky wrote: > Hi, > > When supplying wpa_supplicant.conf with incorrect passwords, but a valid SSID, > I have seen kernel panics several times when using USB based WLAN dongles. > When only supplying a valid password, no panic has been seen. > > How to reproduce: > > 1) configure invalid password > 2) wpa_cli: reconfigure > 3) configure valid password > 4) wpa_cli: reconfigure > 5) goto 1 > > The USB commands which are executed inside the newstate callback usually take > very little time, but still not as little time as PCI read/writes. I've forced > slower operation in the newstate callback, and can reproduce warning printouts > from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to your > USB code: > > http://p4web.freebsd.org/@@180604?ac=10 > > In my opinion the deferring of all states to a single task is wrong. There > should be at least one task per possible state, and the queuing mechanism > should follow the last-queued is last executed rule. This is not the case with > the task-queue mechanism in the kernel. This turned out to be refcounting of the ieee80211_node struct which was causing this panic. vap->iv_bss can be freed at any time so all users of it need to bump the refcount to use it safely. This patch should fix the panic in the rum driver. http://people.freebsd.org/~thompsa/rum_node_refcnt.diff There are other places where it is still an issue such as the ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and of course all other ieee80211 drivers. Andrew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On Thursday 08 July 2010 05:12:37 Andrew Thompson wrote: > On 8 July 2010 07:13, Hans Petter Selasky wrote: > > Hi, > > > > When supplying wpa_supplicant.conf with incorrect passwords, but a valid > > SSID, I have seen kernel panics several times when using USB based WLAN > > dongles. When only supplying a valid password, no panic has been seen. > > > > How to reproduce: > > > > 1) configure invalid password > > 2) wpa_cli: reconfigure > > 3) configure valid password > > 4) wpa_cli: reconfigure > > 5) goto 1 > > > > The USB commands which are executed inside the newstate callback usually > > take very little time, but still not as little time as PCI read/writes. > > I've forced slower operation in the newstate callback, and can reproduce > > warning printouts from the IEEE802.11 layer in FreeBSD. Try to apply the > > following patch to your USB code: > > > > http://p4web.freebsd.org/@@180604?ac=10 > > > > In my opinion the deferring of all states to a single task is wrong. > > There should be at least one task per possible state, and the queuing > > mechanism should follow the last-queued is last executed rule. This is > > not the case with the task-queue mechanism in the kernel. > Hi, > You dont say why it should be this way, do you have an example of a > problem this fixes? Right. The problem the way I see it, is that during execution of the newstate callback a new IEE802.11 state event comes along and alters the state of variables which we might refer. If you look in this function, "ieee80211_new_state_locked()" you see that it is checking for: if (vap->iv_flags_ext & IEEE80211_FEXT_STATEWAIT) { But it still continues setting the new state, even though that flag is set. Another issue I see, is that the vap->iv_nstate and vap->iv_state variables are cached. Then vap->iv_newstate() is called, which can sleep. After that we are we are checking the cached version of the state variables, which possibly leads to making wrong decisions? See: ieee80211_newstate_cb(): IEEE80211_LOCK(ic); ... nstate = vap->iv_nstate; ... ostate = vap->iv_state; ... rc = vap->iv_newstate(vap, nstate, arg); iv_newstate can drop the IC lock, and so the state can change! ... If the state transitions are coming too fast, which is the case, the taskqueue mechanism used by the IEE802.11 layer is going to loose callbacks, and even risk out of order execution, for the newstate function. I'm using 9-current as of one week ago on this computer. The same problem has also been seen on 8-stable. And I'm using 2-core CPU. > I think the single state thread is correct. The whole thing works on > state transitions, you dont just set a state. Try my patch, and see that warnings that are printed when you leave wpa_supplicant configured with a wrong password. You should be able to reproduce, just run a test for some time and it will panic. > > > Description of panics. I didn't have core dump enabled on this box, so > > please bear over with the following hand-written notes: > > > > 1) A vap->iv_bss == NULL, inside ratectl task in RUM driver. > > > > 2) A memcpy() fails inside the iee80211...newstate_cb() > > > > 3) This and similar printouts are seen: > > > > wlan0: ieee80211_new_state_locked: pending AUTH -> ASSOC transition lost > > Can you see if you can get a core dump, or at least a DDB trace and > the output from `show vap ` That might take some time. I'm very busy nowadays building a new house. I will check e-mails later today. Try to reproduce using the steps I've given. Maybe also try to insert some DELAY() around newstate execution, hence this problem does not occur N/N times. Another suggestion. Add some debug code to WLAN: while (to--) { set random state(); if (random() & 1) pause("WDLY", 1); } In rum/run/... newstate function, add: if (random() & 1) pause("WDLY", 1); // OR DELAY(125 * (random() & 7)) Aswell at the beginning and end. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
On 8 July 2010 07:13, Hans Petter Selasky wrote: > Hi, > > When supplying wpa_supplicant.conf with incorrect passwords, but a valid SSID, > I have seen kernel panics several times when using USB based WLAN dongles. > When only supplying a valid password, no panic has been seen. > > How to reproduce: > > 1) configure invalid password > 2) wpa_cli: reconfigure > 3) configure valid password > 4) wpa_cli: reconfigure > 5) goto 1 > > The USB commands which are executed inside the newstate callback usually take > very little time, but still not as little time as PCI read/writes. I've forced > slower operation in the newstate callback, and can reproduce warning printouts > from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to your > USB code: > > http://p4web.freebsd.org/@@180604?ac=10 > > In my opinion the deferring of all states to a single task is wrong. There > should be at least one task per possible state, and the queuing mechanism > should follow the last-queued is last executed rule. This is not the case with > the task-queue mechanism in the kernel. You dont say why it should be this way, do you have an example of a problem this fixes? I think the single state thread is correct. The whole thing works on state transitions, you dont just set a state. > > Description of panics. I didn't have core dump enabled on this box, so please > bear over with the following hand-written notes: > > 1) A vap->iv_bss == NULL, inside ratectl task in RUM driver. > > 2) A memcpy() fails inside the iee80211...newstate_cb() > > 3) This and similar printouts are seen: > > wlan0: ieee80211_new_state_locked: pending AUTH -> ASSOC transition lost Can you see if you can get a core dump, or at least a DDB trace and the output from `show vap ` Andrew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
[panic] Race in IEEE802.11 layer towards device drivers
Hi, When supplying wpa_supplicant.conf with incorrect passwords, but a valid SSID, I have seen kernel panics several times when using USB based WLAN dongles. When only supplying a valid password, no panic has been seen. How to reproduce: 1) configure invalid password 2) wpa_cli: reconfigure 3) configure valid password 4) wpa_cli: reconfigure 5) goto 1 The USB commands which are executed inside the newstate callback usually take very little time, but still not as little time as PCI read/writes. I've forced slower operation in the newstate callback, and can reproduce warning printouts from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to your USB code: http://p4web.freebsd.org/@@180604?ac=10 In my opinion the deferring of all states to a single task is wrong. There should be at least one task per possible state, and the queuing mechanism should follow the last-queued is last executed rule. This is not the case with the task-queue mechanism in the kernel. See the USB code's task-queue replacement which I think the IEEE802.11 stack in FreeBSD could take advantage of. src/sys/dev/usb/usb_process.c Description of panics. I didn't have core dump enabled on this box, so please bear over with the following hand-written notes: 1) A vap->iv_bss == NULL, inside ratectl task in RUM driver. 2) A memcpy() fails inside the iee80211...newstate_cb() 3) This and similar printouts are seen: wlan0: ieee80211_new_state_locked: pending AUTH -> ASSOC transition lost --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"