Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-22 Thread PseudoCylon

 
"FreeBSD and all other open source projects are the Tower of Babel in computer 
era." --me
So, join me @ git://dev.nasreddine.com/run.git
(or http://dev.nasreddine.com/gitweb/?p=run.git;a=summary )
Just work on any of *_dev branches.



- Original Message 
> From: Hans Petter Selasky 
> To: freebsd-current@freebsd.org
> Cc: PseudoCylon ; Sam Leffler ; 
>freebsd-...@freebsd.org
> Sent: Wed, July 21, 2010 10:00:26 AM
> Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers
> 
> Hi,
> 
> Please confirm that this patch is working for you:
> 
> http://p4web.freebsd.org/@@181261?ac=10
> 
> --HPS
> 

Confirmed it's properly been updated.

AK



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-21 Thread Hans Petter Selasky
Hi,

Please confirm that this patch is working for you:

http://p4web.freebsd.org/@@181261?ac=10

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-21 Thread PseudoCylon
- Original Message 
> From: Hans Petter Selasky 
> To: PseudoCylon 
> Cc: freebsd-current@freebsd.org; Sam Leffler ; 
>freebsd-...@freebsd.org
> Sent: Tue, July 20, 2010 4:46:34 AM
> Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers
> 
> On Tuesday 20 July 2010 12:03:22 PseudoCylon wrote:
> > - Original  Message 
> > 
> > > From: Hans Petter Selasky 
> > > To: freebsd-current@freebsd.org
> >  > Cc: PseudoCylon ; Sam  Leffler 
;
> > >
> >  >freebsd-...@freebsd.org
> >  >
> > > Sent: Mon, July 19, 2010 1:17:04 PM
> > > Subject: Re:  [panic] Race in IEEE802.11 layer towards device drivers
> > > 
> >  > Hi AK,
> > > 
> > > I've committed your patches to USB P4.  I've made some additional 
> > > patches.
> > > 
> > > Can  you check and verify everything?
> > > 
> > > http://p4web.freebsd.org/@@181189?ac=10
> > 
> > Hi
> > 
> > If we change sc->cmdq_run = RUN_CMDQ_ABORT,
> > 
> > --  begin excerpt --
> > 
> > 
> > @@ -4890,7 +4877,10 @@ run_stop(void  *arg)
> >  ifp->if_drv_flags &= ~(IFF_DRV_RUNNING |  IFF_DRV_OACTIVE);
> > 
> >  sc->ratectl_run =  RUN_RATECTL_OFF;
> > -sc->cmdq_run = RUN_CMDQ_ABORT;
> > +
> >  +RUN_CMDQ_LOCK(sc);
> > +sc->cmdq_run = sc->cmdq_key_set =  RUN_CMDQ_ABORT;
> > +RUN_CMDQ_UNLOCK(sc);
> > 
> > -- end excerpt  --
> > 
> > 
> > we also need to change this, otherwise key will be  cleared.
> 
> Ok.
> 
> Try to give the second mutex a different name, and  see how many warnings go 
> away.
> 
> --HPS
> 

Giving different name makes all of "duplicate lock" warnings away.

Here is the patch includes all changes

-- begin patch --

diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c
index 017e4b0..da22077 100644
--- a/dev/usb/wlan/if_run.c
+++ b/dev/usb/wlan/if_run.c
@@ -549,7 +549,7 @@ run_attach(device_t self)
 mtx_init(&sc->sc_mtx, device_get_nameunit(sc->sc_dev),
 MTX_NETWORK_LOCK, MTX_DEF);
 mtx_init(&sc->sc_cmdq_mtx, device_get_nameunit(sc->sc_dev),
-MTX_NETWORK_LOCK, MTX_DEF);
+"command queue", MTX_DEF);
 
 iface_index = RT2860_IFACE_INDEX;
 
@@ -4670,8 +4670,6 @@ run_init_locked(struct run_softc *sc)
 if(ic->ic_nrunning > 1)
 return;
 
-run_stop(sc);
-
 for (ntries = 0; ntries < 100; ntries++) {
 if (run_read(sc, RT2860_ASIC_VER_ID, &tmp) != 0)
 goto fail;

-- end patch --


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-20 Thread Hans Petter Selasky
On Tuesday 20 July 2010 12:03:22 PseudoCylon wrote:
> - Original Message 
> 
> > From: Hans Petter Selasky 
> > To: freebsd-current@freebsd.org
> > Cc: PseudoCylon ; Sam Leffler ;
> >
> >freebsd-...@freebsd.org
> >
> > Sent: Mon, July 19, 2010 1:17:04 PM
> > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers
> > 
> > Hi AK,
> > 
> > I've committed your patches to USB P4. I've made some additional 
> > patches.
> > 
> > Can you check and verify everything?
> > 
> > http://p4web.freebsd.org/@@181189?ac=10
> 
> Hi
> 
> If we change sc->cmdq_run = RUN_CMDQ_ABORT,
> 
> -- begin excerpt --
> 
> 
> @@ -4890,7 +4877,10 @@ run_stop(void *arg)
>  ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
> 
>  sc->ratectl_run = RUN_RATECTL_OFF;
> -sc->cmdq_run = RUN_CMDQ_ABORT;
> +
> +RUN_CMDQ_LOCK(sc);
> +sc->cmdq_run = sc->cmdq_key_set = RUN_CMDQ_ABORT;
> +RUN_CMDQ_UNLOCK(sc);
> 
> -- end excerpt --
> 
> 
> we also need to change this, otherwise key will be cleared.

Ok.

Try to give the second mutex a different name, and see how many warnings go 
away.

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-20 Thread PseudoCylon
- Original Message 
> From: Hans Petter Selasky 
> To: freebsd-current@freebsd.org
> Cc: PseudoCylon ; Sam Leffler ; 
>freebsd-...@freebsd.org
> Sent: Mon, July 19, 2010 1:17:04 PM
> Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers
> 
> Hi AK,
> 
> I've committed your patches to USB P4. I've made some additional  patches.
> 
> Can you check and verify everything?
> 
> http://p4web.freebsd.org/@@181189?ac=10
> 

Hi

If we change sc->cmdq_run = RUN_CMDQ_ABORT,

-- begin excerpt --


@@ -4890,7 +4877,10 @@ run_stop(void *arg)
 ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 
 sc->ratectl_run = RUN_RATECTL_OFF;
-sc->cmdq_run = RUN_CMDQ_ABORT;
+
+RUN_CMDQ_LOCK(sc);
+sc->cmdq_run = sc->cmdq_key_set = RUN_CMDQ_ABORT;
+RUN_CMDQ_UNLOCK(sc);

-- end excerpt --


we also need to change this, otherwise key will be cleared.


-- begin patch --

diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c
index 017e4b0..f7abe17 100644
--- a/dev/usb/wlan/if_run.c
+++ b/dev/usb/wlan/if_run.c
@@ -4670,8 +4670,6 @@ run_init_locked(struct run_softc *sc)
 if(ic->ic_nrunning > 1)
 return;
 
-run_stop(sc);
-
 for (ntries = 0; ntries < 100; ntries++) {
 if (run_read(sc, RT2860_ASIC_VER_ID, &tmp) != 0)
 goto fail;

-- end patch --


> Also please compile a kernel  with WITNESS enabled to catch any LOR's, hence 
> we 
>
> introduced another  mutex.
> 


The 2nd mutex did solve a deadlock, but doesn't solve the LOR.


-- begin message --

lock order reversal:
 1st 0xff8000a257d0 run0_node_lock (run0_node_lock) @ 
/usr/src/sys/net80211/ieee80211_node.c:1736
 2nd 0xff8000a19348 run0 (network driver) @ 
/mnt/share/home/AK/FreeBSD/modules/usb/run/../../../../mnt/dev/usb/wlan/if_run.c:2212

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_mtx_lock_flags() at _mtx_lock_flags+0x78
run_key_delete() at run_key_delete+0x45
_ieee80211_crypto_delkey() at _ieee80211_crypto_delkey+0x9e
ieee80211_crypto_delkey() at ieee80211_crypto_delkey+0x28
ieee80211_node_delucastkey() at ieee80211_node_delucastkey+0x78
ieee80211_sta_leave() at ieee80211_sta_leave+0x16
ieee80211_node_leave() at ieee80211_node_leave+0x11d
hostap_recv_mgmt() at hostap_recv_mgmt+0x33f
hostap_input() at hostap_input+0xc09
run_rx_frame() at run_rx_frame+0x13f
run_bulk_rx_callback() at run_bulk_rx_callback+0x3b7
usbd_callback_wrapper() at usbd_callback_wrapper+0x12b
usb_command_wrapper() at usb_command_wrapper+0x76
usb_callback_proc() at usb_callback_proc+0x76
usb_process() at usb_process+0xbb
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8029b4ed30, rbp = 0 ---

-- end message --

or

-- begin message --

lock order reversal:
 1st 0xff8000a257d0 run0_node_lock (run0_node_lock) @ 
/usr/src/sys/net80211/ieee80211_node.c:1736
 2nd 0xff8000a19348 run0 (network driver) @ 
/mnt/share/home/AK/FreeBSD/modules/usb/run/../../../../mnt/dev/usb/wlan/if_run.c:2212

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_mtx_lock_flags() at _mtx_lock_flags+0x78
run_key_delete() at run_key_delete+0x47
_ieee80211_crypto_delkey() at _ieee80211_crypto_delkey+0x9e
ieee80211_crypto_delkey() at ieee80211_crypto_delkey+0x28
ieee80211_node_delucastkey() at ieee80211_node_delucastkey+0x78
ieee80211_sta_leave() at ieee80211_sta_leave+0x16
ieee80211_node_leave() at ieee80211_node_leave+0x11d
ieee80211_node_timeout() at ieee80211_node_timeout+0x1d5
softclock() at softclock+0x2a0
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff852d30, rbp = 0 ---

-- end message --


There are new warning, "acquiring duplicate lock." For example,


-- begin message --

acquiring duplicate lock of same type: "network driver"
 1st run0 @ /usr/src/sys/dev/usb/usb_request.c:691
 2nd run0 @ 
/mnt/share/home/AK/FreeBSD/modules/usb/run/../../../../mnt/dev/usb/wlan/if_run.c:4831

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x8ef
_mtx_lock_flags() at _mtx_lock_flags+0x78
run_init_locked() at run_init_locked+0x753
run_ioctl() at run_ioctl+0xad
taskqueue_run() at taskqueue_run+0x91
taskqueue_thread_loop() at taskqueue_thread_loop+0x3f
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff803e5b2d30, rbp = 0 ---

-- end message --


I don't know if it's worth patching or safe to patch (specially lock/unlock in 

Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-19 Thread Hans Petter Selasky
Hi AK,

I've committed your patches to USB P4. I've made some additional patches.

Can you check and verify everything?

http://p4web.freebsd.org/@@181189?ac=10

Also please compile a kernel with WITNESS enabled to catch any LOR's, hence we 
introduced another mutex.

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-18 Thread PseudoCylon
[NB] Obviously, I didn't click "reply ALL" last time, so here are missing part

> > > >  -if(vap->iv_opmode == IEEE80211_M_HOSTAP){
> > > > 
> > > > -RUN_LOCK(sc);
> > > > +if  (vap->iv_opmode == IEEE80211_M_HOSTAP)
> > > > 
> > > >  sc->cmdq_key_set =  RUN_CMDQ_GO;
> > > > 
> > > > -RUN_UNLOCK(sc);
> > > > -}
> > > > 
> > > > 
> > > > Why  are you removing these locks?
> > 
> > It is simple assignment, it must be atomic.
> 
> Not necessarily. If you don't put lock statements around it or use the 
> volatile keyword, the compiler can re-organize the execution order. I will 
> have a look at it.
> 
> > 
> > > Another question:
> > >  i = RUN_CMDQ_GET(&sc->cmdq_store);
> > >  DPRINTF("cmdq_store=%d\n", i);
> > >
> > >sc->cmdq[i].func =  run_update_beacon_cb;
> > >sc->cmdq[i].arg0 =  vap;
> > > 
> > > Why is this code and similar places not enclosed with  mutexes?
> > 
> > First, I couldn't use a lock in key_delete() because of LoR. So, I use
> > atomic instead. RUN_CMDQ_GET is atomic_fetch_add(). Whatever executes that
> > code gets unique place (cmdq[i]) to write, so there shouldn't be any race.
> > 
> > Then out of order execution happened. Specially, when key_set() overtakes
> > key_delete(), encryption fails. So, all deferred processes are called back
> > via run_cmdq_cb() to maintain the order. Because cmdq functions are first
> > written for key_delete() where lock causes LoR. So, lock isn't needed.
> > 
> > run_cmdq_cb() uses lock. But it is for calling callback functions locked.
> > So that, functions just call another function locked, like
> > ##_callback()
> > {
> >LOCK();
> >##_locked();
> >UNLOCK();
> > }
> > won't be needed.
> 
> If the run_cmdq_cb() is running at the same time which you are queuing 
> elements, then I note that you set .func before .arg0. The ##_callback() code 
> only checks if .func has been set. Actually the "i" increment should be after 
> that you filled out the data, and then you see that you cannot use atomic. I 
> think the most simple solution is to add another mutex, sc->sc_cmdq_mtx, 
which 
> protects the queue and it's associated data. Also, what do you do if the 
queue 
> wraps around? You should have a mechanism to prevent that, because you then 
> might start executing commands in random order?
> 

Here is a patch (patch against P4 if_run.c rev 14 and if_runvar.h rev 8)

-- begin patch --


diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c
index 8c96534..c988ad4 100644
--- a/dev/usb/wlan/if_run.c
+++ b/dev/usb/wlan/if_run.c
@@ -90,12 +90,6 @@ SYSCTL_INT(_hw_usb_run, OID_AUTO, debug, CTLFLAG_RW, 
&run_debug, 0,
 #define IEEE80211_HAS_ADDR4(wh) \
 (((wh)->i_fc[1] & IEEE80211_FC1_DIR_MASK) == IEEE80211_FC1_DIR_DSTODS)
 
-/*
- * Because of LOR in run_key_delete(), use atomic instead.
- * '& RUN_CMDQ_MASQ' is to loop cmdq[].
- */
-#define RUN_CMDQ_GET(c)(atomic_fetchadd_32((c), 1) & RUN_CMDQ_MASQ)
-
 static const struct usb_device_id run_devs[] = {
 { USB_VP(USB_VENDOR_ABOCOM,USB_PRODUCT_ABOCOM_RT2770) },
 { USB_VP(USB_VENDOR_ABOCOM,USB_PRODUCT_ABOCOM_RT2870) },
@@ -554,6 +548,8 @@ run_attach(device_t self)
 
 mtx_init(&sc->sc_mtx, device_get_nameunit(sc->sc_dev),
 MTX_NETWORK_LOCK, MTX_DEF);
+mtx_init(&sc->sc_cmdq_mtx, device_get_nameunit(sc->sc_dev),
+MTX_NETWORK_LOCK, MTX_DEF);
 
 iface_index = RT2860_IFACE_INDEX;
 
@@ -737,6 +733,7 @@ run_detach(device_t self)
 }
 
 mtx_destroy(&sc->sc_mtx);
+mtx_destroy(&sc->sc_cmdq_mtx);
 
 return (0);
 }
@@ -830,9 +827,6 @@ run_vap_create(struct ieee80211com *ic,
 if(sc->rvp_cnt++ == 0)
 ic->ic_opmode = opmode;
 
-if(opmode == IEEE80211_M_HOSTAP)
-sc->cmdq_run = RUN_CMDQ_GO;
-
 DPRINTF("rvp_id=%d bmap=%x rvp_cnt=%d\n",
 rvp->rvp_id, sc->rvp_bmap, sc->rvp_cnt);
 
@@ -889,27 +883,31 @@ run_cmdq_cb(void *arg, int pending)
 struct run_softc *sc = arg;
 uint8_t i;
 
-/* call cmdq[].func locked */
-RUN_LOCK(sc);
-for(i = sc->cmdq_exec; sc->cmdq[i].func && pending;
-i = sc->cmdq_exec, pending--){
+RUN_CMDQ_LOCK(sc);
+for (i = sc->cmdq_exec; sc->cmdq[i].func; i = sc->cmdq_exec) {
 DPRINTFN(6, "cmdq_exec=%d pending=%d\n", i, pending);
-if(sc->cmdq_run == RUN_CMDQ_GO){
+if (sc->cmdq_run == RUN_CMDQ_GO ||
+(sc->cmdq_key_set == RUN_CMDQ_GO &&
+sc->cmdq[i].func == run_key_set_cb)) {
+RUN_CMDQ_UNLOCK(sc);
+RUN_LOCK(sc);
 /*
  * If arg0 is NULL, callback func needs more
  * than one arg. So, pass ptr to cmdq struct.
  */
-if(sc->cmdq[i].arg0)
+if (sc->cmdq[i].arg0)
 sc->cmdq[i].func(sc->cmdq[i].arg0);
 else
 sc->cmdq[i].func(&sc->cmdq[i]);
+RUN_UNLOCK(sc);
+RUN_CMDQ_LOCK(sc);
 }
 sc->cmdq[i].arg0 = NULL;
 sc->cmdq[i].func = NULL;
 sc->cmdq_exec++;
 sc->cmdq_exec &= RUN_CMDQ_MASQ;
 }
-RUN_UNLOCK(sc);
+RUN_CMDQ_UNLOCK(sc);
 }
 
 static void
@@ -1771,6 +1769,19 @@ run_newstate(struct ieee80211vap *vap, enum 
ieee80211_state nstate, int arg)
 case IEEE80211_S_INIT:
 restart_ratectl = 1;
 
+/*
+ * When hostapd has set a key, don't clear it.
+ * But, when the devic

Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-14 Thread Hans Petter Selasky
On Wednesday 14 July 2010 14:31:29 PseudoCylon wrote:
> -if(vap->iv_opmode == IEEE80211_M_HOSTAP){
> -RUN_LOCK(sc);
> +if (vap->iv_opmode == IEEE80211_M_HOSTAP)
>  sc->cmdq_key_set = RUN_CMDQ_GO;
> -RUN_UNLOCK(sc);
> -}
>  

> Why are you removing these locks?

Another question:

i = RUN_CMDQ_GET(&sc->cmdq_store);
DPRINTF("cmdq_store=%d\n", i);
sc->cmdq[i].func = run_update_beacon_cb;
sc->cmdq[i].arg0 = vap;

Why is this code and similar places not enclosed with mutexes?

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-14 Thread Hans Petter Selasky
On Wednesday 14 July 2010 14:31:29 PseudoCylon wrote:
> -if(vap->iv_opmode == IEEE80211_M_HOSTAP){
> -RUN_LOCK(sc);
> +if (vap->iv_opmode == IEEE80211_M_HOSTAP)
>  sc->cmdq_key_set = RUN_CMDQ_GO;
> -RUN_UNLOCK(sc);
> -}
>  

Why are you removing these locks?

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-14 Thread PseudoCylon
- Original Message 
> From: Hans Petter Selasky 
> To: freebsd-current@freebsd.org
> Cc: Andrew Thompson ; Sam Leffler ; 
>PseudoCylon ; freebsd-...@freebsd.org
> Sent: Mon, July 12, 2010 2:01:11 PM
> Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers
> 
> Hi Andrew,
> 
> Your patch appears to be working. Can you fix this issue in  the other WLAN 
> drivers aswell? Then send an e-mail to request testing? I had  a go at it 
here:
> 
> http://p4web.freebsd.org/@@180844?ac=10
> 

Can this patch be included into testing? patch to P4 USB rev. 14 if_run.c

If not, at least please delete an extra semicolon at the end of line 3191 (must 
be an merge bug). I missed it since it wans't in diff/patch.


AK


summary of changes
* fixes bugs in rev 209144
   a shared key was written properly only on first time init, but not 
subsequent 
init. Make sure the key is written all the time.
* stop checking 'pending' in run_cmdq_cb().
   When loop 'pending' times, new tasks enqueued while looping won't be 
executed 
because 'pending' passed from taskqueue function won't be incremented.

-- patch begin --

diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c
index 7a3952c..12b45ec 100644
--- a/dev/usb/wlan/if_run.c
+++ b/dev/usb/wlan/if_run.c
@@ -830,9 +830,6 @@ run_vap_create(struct ieee80211com *ic,
 if(sc->rvp_cnt++ == 0)
 ic->ic_opmode = opmode;
 
-if(opmode == IEEE80211_M_HOSTAP)
-sc->cmdq_run = RUN_CMDQ_GO;
-
 DPRINTF("rvp_id=%d bmap=%x rvp_cnt=%d\n",
 rvp->rvp_id, sc->rvp_bmap, sc->rvp_cnt);
 
@@ -891,15 +888,16 @@ run_cmdq_cb(void *arg, int pending)
 
 /* call cmdq[].func locked */
 RUN_LOCK(sc);
-for(i = sc->cmdq_exec; sc->cmdq[i].func && pending;
-i = sc->cmdq_exec, pending--){
+for (i = sc->cmdq_exec; sc->cmdq[i].func; i = sc->cmdq_exec) {
 DPRINTFN(6, "cmdq_exec=%d pending=%d\n", i, pending);
-if(sc->cmdq_run == RUN_CMDQ_GO){
+if (sc->cmdq_run == RUN_CMDQ_GO ||
+(sc->cmdq_key_set == RUN_CMDQ_GO &&
+sc->cmdq[i].func == run_key_set_cb)) {
 /*
  * If arg0 is NULL, callback func needs more
  * than one arg. So, pass ptr to cmdq struct.
  */
-if(sc->cmdq[i].arg0)
+if (sc->cmdq[i].arg0)
 sc->cmdq[i].func(sc->cmdq[i].arg0);
 else
 sc->cmdq[i].func(&sc->cmdq[i]);
@@ -1771,6 +1769,19 @@ run_newstate(struct ieee80211vap *vap, enum 
ieee80211_state nstate, int arg)
 case IEEE80211_S_INIT:
 restart_ratectl = 1;
 
+/*
+ * When hostapd has set a key, don't clear it.
+ * But, when the device is being brought down, clear it.
+ */
+if (sc->cmdq_key_set != RUN_CMDQ_GO ||
+ostate == IEEE80211_S_RUN) {
+/* clear shared key table */
+run_set_region_4(sc,
+RT2860_SKEY(rvp->rvp_id, 0), 0, 4 * 32);
+/* clear shared key mode */
+run_set_region_4(sc, RT2860_SKEY_MODE_0_7, 0, 4);
+}
+
 if (ostate != IEEE80211_S_RUN)
 break;
 
@@ -2100,13 +2111,10 @@ run_key_set(struct ieee80211vap *vap, struct 
ieee80211_key *k,
  * To make sure key will be set when hostapd
  * calls iv_key_set() before if_init().
  */
-if(vap->iv_opmode == IEEE80211_M_HOSTAP){
-RUN_LOCK(sc);
+if (vap->iv_opmode == IEEE80211_M_HOSTAP)
 sc->cmdq_key_set = RUN_CMDQ_GO;
-RUN_UNLOCK(sc);
-}
 
-return(1);
+return (1);
 }
 
 /*
@@ -3188,7 +3196,7 @@ run_sendprot(struct run_softc *sc,
 ackrate = ieee80211_ack_rate(ic->ic_rt, rate);
 
 isshort = (ic->ic_flags & IEEE80211_F_SHPREAMBLE) != 0;
-dur = ieee80211_compute_duration(ic->ic_rt, pktlen, rate, isshort);
+dur = ieee80211_compute_duration(ic->ic_rt, pktlen, rate, isshort)
 + ieee80211_ack_duration(ic->ic_rt, rate, isshort);
 wflags = RT2860_TX_FRAG;
 
@@ -4693,14 +4701,6 @@ run_init_locked(struct run_softc *sc)
 /* clear WCID attribute table */
 run_set_region_4(sc, RT2860_WCID_ATTR(0), 0, 8 * 32);
 
-/* hostapd sets a key before init. So, don't clear it. */
-if(sc->cmdq_key_set != RUN_CMDQ_GO){
-/* clear shared key table */
-run_set_region_4(sc, RT2860_SKEY(0, 0), 0, 8 * 32);
-/* clear shared key mode */
-run_set_region_4(sc, RT2860_SKEY_MODE_0_7, 0, 4);
-}
-
 run_read(sc, RT2860_US_CYC_CNT, &tmp);
 tmp = (tmp & ~0xff) | 0x1e;
 run_write(sc, RT2860_US_CYC_CNT, tmp);
@@ -4807,7 +4807,7 @@ run_stop(void *arg)
 ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 
 sc->ratectl_run = RUN_RATECTL_OFF;
-sc->cmdq_run = sc->cmdq_key_set;
+sc->cmdq_run = RUN_CMDQ_ABORT;
 
 RUN_UNLOCK(sc);
 
-- end patch --


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-12 Thread Hans Petter Selasky
On Tuesday 13 July 2010 03:54:08 PseudoCylon wrote:
> Should the debugging code, usb_pause_mtx(), be left in the code for
>  testing? If  the drivers don't panic to begin with, we won't know the
>  patch really fixed the issue.
> 

No, I think it is safe to remove it.

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-12 Thread PseudoCylon
- Original Message 
> From: Hans Petter Selasky 
> To: freebsd-current@freebsd.org
> Cc: Andrew Thompson ; Sam Leffler ; 
>PseudoCylon ; freebsd-...@freebsd.org
> Sent: Mon, July 12, 2010 2:01:11 PM
> Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers
> 
> Hi Andrew,
> 
> Your patch appears to be working. Can you fix this issue in  the other WLAN 
> drivers aswell? Then send an e-mail to request testing? I had  a go at it 
here:
> 
> http://p4web.freebsd.org/@@180844?ac=10
> 

Since I wasn't able to reproduce the panic, I can only confirm the patched 
run(4) driver runs OK.

I just want to patch hostap related fix before calling for this test. I'll ask 
the user if it's fixed.

Should the debugging code, usb_pause_mtx(), be left in the code for testing? If 
the drivers don't panic to begin with, we won't know the patch really fixed the 
issue.


AK

> I found  another panic issue:
> 
> ifconfig wlan0 delete
> ifconfig wlan0  destroy
> 
> When not associate or associated.
> 
> Backtrace (AMD64 -  9-current):
> 
> node_free() + 0x2c
> rum_tx_free() + 0x3b
> which is called  from the bulk tx callback
> 
> Another thread is running an IOCTL ->  rum_stop(), which causes the CANCELLED 
> event to be passed to USB. Can't we  free any nodes at this point?
> 
> --HPS
> 
> > This turned out to be  refcounting of the ieee80211_node struct which
> > was causing this panic.  vap->iv_bss can be freed at any time so all
> > users of it need to bump  the refcount to use it safely.
> > 
> > This patch should fix the panic  in the rum driver.
> > http://people.freebsd.org/~thompsa/rum_node_refcnt.diff
> > 
> >  There are other places where it is still an issue such as the
> >  ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and
> > of  course all other ieee80211 drivers.
> > 
> 


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-12 Thread Hans Petter Selasky
Hi Andrew,

Your patch appears to be working. Can you fix this issue in the other WLAN 
drivers aswell? Then send an e-mail to request testing? I had a go at it here:

http://p4web.freebsd.org/@@180844?ac=10

I found another panic issue:

ifconfig wlan0 delete
ifconfig wlan0 destroy

When not associate or associated.

Backtrace (AMD64 - 9-current):

node_free() + 0x2c
rum_tx_free() + 0x3b
which is called from the bulk tx callback

Another thread is running an IOCTL -> rum_stop(), which causes the CANCELLED 
event to be passed to USB. Can't we free any nodes at this point?

--HPS
 
> This turned out to be refcounting of the ieee80211_node struct which
> was causing this panic. vap->iv_bss can be freed at any time so all
> users of it need to bump the refcount to use it safely.
> 
> This patch should fix the panic in the rum driver.
> http://people.freebsd.org/~thompsa/rum_node_refcnt.diff
> 
> There are other places where it is still an issue such as the
> ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and
> of course all other ieee80211 drivers.
> 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-12 Thread Rui Paulo

On 12 Jul 2010, at 01:07, Andrew Thompson wrote:

> On 8 July 2010 07:13, Hans Petter Selasky  wrote:
>> Hi,
>> 
>> When supplying wpa_supplicant.conf with incorrect passwords, but a valid 
>> SSID,
>> I have seen kernel panics several times when using USB based WLAN dongles.
>> When only supplying a valid password, no panic has been seen.
>> 
>> How to reproduce:
>> 
>> 1) configure invalid password
>> 2) wpa_cli: reconfigure
>> 3) configure valid password
>> 4) wpa_cli: reconfigure
>> 5) goto 1
>> 
>> The USB commands which are executed inside the newstate callback usually take
>> very little time, but still not as little time as PCI read/writes. I've 
>> forced
>> slower operation in the newstate callback, and can reproduce warning 
>> printouts
>> from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to 
>> your
>> USB code:
>> 
>> http://p4web.freebsd.org/@@180604?ac=10
>> 
>> In my opinion the deferring of all states to a single task is wrong. There
>> should be at least one task per possible state, and the queuing mechanism
>> should follow the last-queued is last executed rule. This is not the case 
>> with
>> the task-queue mechanism in the kernel.
> 
> This turned out to be refcounting of the ieee80211_node struct which
> was causing this panic. vap->iv_bss can be freed at any time so all
> users of it need to bump the refcount to use it safely.
> 
> This patch should fix the panic in the rum driver.
> http://people.freebsd.org/~thompsa/rum_node_refcnt.diff
> 
> There are other places where it is still an issue such as the
> ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and
> of course all other ieee80211 drivers.

Oh, this makes sense now. My previous attempt at help you made no sense...

Regards,
--
Rui Paulo


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-11 Thread Hans Petter Selasky
On Monday 12 July 2010 02:07:55 Andrew Thompson wrote:
> This turned out to be refcounting of the ieee80211_node struct which
> was causing this panic. vap->iv_bss can be freed at any time so all
> users of it need to bump the refcount to use it safely.
> 
> This patch should fix the panic in the rum driver.
> http://people.freebsd.org/~thompsa/rum_node_refcnt.diff
> 
> There are other places where it is still an issue such as the
> ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and
> of course all other ieee80211 drivers.
> 

I will give your patch a try later today.

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-11 Thread Andrew Thompson
On 8 July 2010 07:13, Hans Petter Selasky  wrote:
> Hi,
>
> When supplying wpa_supplicant.conf with incorrect passwords, but a valid SSID,
> I have seen kernel panics several times when using USB based WLAN dongles.
> When only supplying a valid password, no panic has been seen.
>
> How to reproduce:
>
> 1) configure invalid password
> 2) wpa_cli: reconfigure
> 3) configure valid password
> 4) wpa_cli: reconfigure
> 5) goto 1
>
> The USB commands which are executed inside the newstate callback usually take
> very little time, but still not as little time as PCI read/writes. I've forced
> slower operation in the newstate callback, and can reproduce warning printouts
> from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to your
> USB code:
>
> http://p4web.freebsd.org/@@180604?ac=10
>
> In my opinion the deferring of all states to a single task is wrong. There
> should be at least one task per possible state, and the queuing mechanism
> should follow the last-queued is last executed rule. This is not the case with
> the task-queue mechanism in the kernel.

This turned out to be refcounting of the ieee80211_node struct which
was causing this panic. vap->iv_bss can be freed at any time so all
users of it need to bump the refcount to use it safely.

This patch should fix the panic in the rum driver.
http://people.freebsd.org/~thompsa/rum_node_refcnt.diff

There are other places where it is still an issue such as the
ieee80211_tx_mgt_timeout callout which havnt been addressed yet, and
of course all other ieee80211 drivers.


Andrew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-07 Thread Hans Petter Selasky
On Thursday 08 July 2010 05:12:37 Andrew Thompson wrote:
> On 8 July 2010 07:13, Hans Petter Selasky  wrote:
> > Hi,
> >
> > When supplying wpa_supplicant.conf with incorrect passwords, but a valid
> > SSID, I have seen kernel panics several times when using USB based WLAN
> > dongles. When only supplying a valid password, no panic has been seen.
> >
> > How to reproduce:
> >
> > 1) configure invalid password
> > 2) wpa_cli: reconfigure
> > 3) configure valid password
> > 4) wpa_cli: reconfigure
> > 5) goto 1
> >
> > The USB commands which are executed inside the newstate callback usually
> > take very little time, but still not as little time as PCI read/writes.
> > I've forced slower operation in the newstate callback, and can reproduce
> > warning printouts from the IEEE802.11 layer in FreeBSD. Try to apply the
> > following patch to your USB code:
> >
> > http://p4web.freebsd.org/@@180604?ac=10
> >
> > In my opinion the deferring of all states to a single task is wrong.
> > There should be at least one task per possible state, and the queuing
> > mechanism should follow the last-queued is last executed rule. This is
> > not the case with the task-queue mechanism in the kernel.
> 

Hi,

> You dont say why it should be this way, do you have an example of a
> problem this fixes?

Right.

The problem the way I see it, is that during execution of the newstate 
callback a new IEE802.11 state event comes along and alters the state of 
variables which we might refer.

If you look in this function, "ieee80211_new_state_locked()" you see that it 
is checking for:

if (vap->iv_flags_ext & IEEE80211_FEXT_STATEWAIT) {

But it still continues setting the new state, even though that flag is set. 

Another issue I see, is that the vap->iv_nstate and vap->iv_state variables 
are cached. Then vap->iv_newstate() is called, which can sleep. After that we 
are we are checking the cached version of the state variables, which possibly 
leads to making wrong decisions?

See:

ieee80211_newstate_cb():

IEEE80211_LOCK(ic);
...
nstate = vap->iv_nstate;
...
ostate = vap->iv_state;

...

rc = vap->iv_newstate(vap, nstate, arg);

iv_newstate can drop the IC lock, and so the state can change!

...

If the state transitions are coming too fast, which is the case, the taskqueue 
mechanism used by the IEE802.11 layer is going to loose callbacks, and even 
risk out of order execution, for the newstate function.

I'm using 9-current as of one week ago on this computer. The same problem has 
also been seen on 8-stable. And I'm using 2-core CPU.

> I think the single state thread is correct. The whole thing works on
> state transitions, you dont just set a state.

Try my patch, and see that warnings that are printed when you leave 
wpa_supplicant configured with a wrong password.

You should be able to reproduce, just run a test for some time and it will 
panic.

> 
> > Description of panics. I didn't have core dump enabled on this box, so
> > please bear over with the following hand-written notes:
> >
> > 1) A vap->iv_bss == NULL, inside ratectl task in RUM driver.
> >
> > 2) A memcpy() fails inside the iee80211...newstate_cb()
> >
> > 3) This and similar printouts are seen:
> >
> > wlan0: ieee80211_new_state_locked: pending AUTH -> ASSOC transition lost
> 
> Can you see if you can get a core dump, or at least a DDB trace and
> the output from `show vap `

That might take some time. I'm very busy nowadays building a new house. I will 
check e-mails later today. Try to reproduce using the steps I've given. Maybe 
also try to insert some DELAY() around newstate execution, hence this problem 
does not occur N/N times.

Another suggestion. Add some debug code to WLAN:

while (to--) {
set random state();
if (random() & 1)
pause("WDLY", 1);
}

In rum/run/... newstate function, add:

if (random() & 1)
pause("WDLY", 1); // OR DELAY(125 * (random() & 7))

Aswell at the beginning and end.

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-07 Thread Andrew Thompson
On 8 July 2010 07:13, Hans Petter Selasky  wrote:
> Hi,
>
> When supplying wpa_supplicant.conf with incorrect passwords, but a valid SSID,
> I have seen kernel panics several times when using USB based WLAN dongles.
> When only supplying a valid password, no panic has been seen.
>
> How to reproduce:
>
> 1) configure invalid password
> 2) wpa_cli: reconfigure
> 3) configure valid password
> 4) wpa_cli: reconfigure
> 5) goto 1
>
> The USB commands which are executed inside the newstate callback usually take
> very little time, but still not as little time as PCI read/writes. I've forced
> slower operation in the newstate callback, and can reproduce warning printouts
> from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to your
> USB code:
>
> http://p4web.freebsd.org/@@180604?ac=10
>
> In my opinion the deferring of all states to a single task is wrong. There
> should be at least one task per possible state, and the queuing mechanism
> should follow the last-queued is last executed rule. This is not the case with
> the task-queue mechanism in the kernel.

You dont say why it should be this way, do you have an example of a
problem this fixes?

I think the single state thread is correct. The whole thing works on
state transitions, you dont just set a state.

>
> Description of panics. I didn't have core dump enabled on this box, so please
> bear over with the following hand-written notes:
>
> 1) A vap->iv_bss == NULL, inside ratectl task in RUM driver.
>
> 2) A memcpy() fails inside the iee80211...newstate_cb()
>
> 3) This and similar printouts are seen:
>
> wlan0: ieee80211_new_state_locked: pending AUTH -> ASSOC transition lost

Can you see if you can get a core dump, or at least a DDB trace and
the output from `show vap `


Andrew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[panic] Race in IEEE802.11 layer towards device drivers

2010-07-07 Thread Hans Petter Selasky
Hi,

When supplying wpa_supplicant.conf with incorrect passwords, but a valid SSID, 
I have seen kernel panics several times when using USB based WLAN dongles. 
When only supplying a valid password, no panic has been seen.

How to reproduce:

1) configure invalid password
2) wpa_cli: reconfigure
3) configure valid password
4) wpa_cli: reconfigure
5) goto 1

The USB commands which are executed inside the newstate callback usually take 
very little time, but still not as little time as PCI read/writes. I've forced 
slower operation in the newstate callback, and can reproduce warning printouts 
from the IEEE802.11 layer in FreeBSD. Try to apply the following patch to your 
USB code:

http://p4web.freebsd.org/@@180604?ac=10

In my opinion the deferring of all states to a single task is wrong. There 
should be at least one task per possible state, and the queuing mechanism 
should follow the last-queued is last executed rule. This is not the case with 
the task-queue mechanism in the kernel. See the USB code's task-queue 
replacement which I think the IEEE802.11 stack in FreeBSD could take advantage 
of.

src/sys/dev/usb/usb_process.c

Description of panics. I didn't have core dump enabled on this box, so please 
bear over with the following hand-written notes:

1) A vap->iv_bss == NULL, inside ratectl task in RUM driver.

2) A memcpy() fails inside the iee80211...newstate_cb()

3) This and similar printouts are seen:

wlan0: ieee80211_new_state_locked: pending AUTH -> ASSOC transition lost


--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"