Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-05 Thread Vitaliy Makkoveev
> On 5 Dec 2021, at 20:05, Scott Cheloha  wrote:
> 
> Suppose the ic_bgscan_timeout timeout is running at the moment we're
> running ieee80211_ifdetach().  Ignore the kernel lock for the moment,
> think about the future.
> 
> If we delete the task before we delete the timeout and the timeout
> then adds the task back onto the task queue, what happens?
> 
> My guess is you need to ensure the timeout is no longer running
> *before* you delete the task.  Can you do timeout_del_barrier()
> here?  See the attached patch.

This timeout_del_barrier(9) doesn’t make sense. You also need to
wait ieee80211_rtm_80211info_task() to be accomplished and
taskq_del_barrier(9) should be used instead of task_del(9).

I doubt this code will be the same when unlocking started.

> 
> $ dmesg | grep iwm0 | tail -n2
> iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi
> iwm0: hw rev 0x230, fw ver 36.ca7b901d.0, address 98:3b:8f:ef:6b:ef
> 
> Unsure how `route monitor` exercises this path, but I've left it
> running, too.

You have at least one PF_ROUTE socket. Otherwise route_input()
performs drain run without any solock() call.

> 
> Index: ieee80211.c
> ===
> RCS file: /cvs/src/sys/net80211/ieee80211.c,v
> retrieving revision 1.85
> diff -u -p -r1.85 ieee80211.c
> --- ieee80211.c   11 Oct 2021 09:01:06 -  1.85
> +++ ieee80211.c   5 Dec 2021 17:01:51 -
> @@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
>   if_addgroup(ifp, "wlan");
>   ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
> 
> + task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
>   ieee80211_set_link_state(ic, LINK_STATE_DOWN);
> 
>   timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
> @@ -203,7 +204,8 @@ ieee80211_ifdetach(struct ifnet *ifp)
> {
>   struct ieee80211com *ic = (void *)ifp;
> 
> - timeout_del(>ic_bgscan_timeout);
> + timeout_del_barrier(>ic_bgscan_timeout);
> + task_del(systq, >ic_rtm_80211info_task);
> 
>   /*
>* Undo pseudo-driver changes. Pseudo-driver detach hooks could




Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-05 Thread Stefan Sperling
On Sun, Dec 05, 2021 at 11:05:32AM -0600, Scott Cheloha wrote:
> > diff 0b61c8235787960f0010ef627ea5b2c6309a81f0 
> > de98c050ea709bdb8e26be40ab0cc82ef9afed80
> > blob - 7bb68194dd78417b06c59f81d1ebbff4165203d8
> > blob + 5b9a969258074fde29e21a33ac035cf170ec3b03
> > --- sys/net80211/ieee80211.c
> > +++ sys/net80211/ieee80211.c
> > @@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
> > if_addgroup(ifp, "wlan");
> > ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
> >  
> > +   task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
> > ieee80211_set_link_state(ic, LINK_STATE_DOWN);
> >  
> > timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
> > @@ -203,6 +204,7 @@ ieee80211_ifdetach(struct ifnet *ifp)
> >  {
> > struct ieee80211com *ic = (void *)ifp;
> >  
> > +   task_del(systq, >ic_rtm_80211info_task);
> > timeout_del(>ic_bgscan_timeout);
> 
> Suppose the ic_bgscan_timeout timeout is running at the moment we're
> running ieee80211_ifdetach().  Ignore the kernel lock for the moment,
> think about the future.

There are many more places in the wireless stack that do such things.
But I am not interested in doing MP work on wireless myself. That is
simply asking too much on top of everything else I do.

> If we delete the task before we delete the timeout and the timeout
> then adds the task back onto the task queue, what happens?
> 
> My guess is you need to ensure the timeout is no longer running
> *before* you delete the task.  Can you do timeout_del_barrier()
> here?  See the attached patch.

Yes, sure we can. There are dozens of other timeouts in the wireless
stack and drivers, so your patch is a small step on a very long road.

> > /*
> > blob - 447a2676bfb250b7f917206549679d6ae68de1f6
> > blob + 7e10fc1336067542c13d5607602e658ce2b3926b
> > --- sys/net80211/ieee80211_proto.c
> > +++ sys/net80211/ieee80211_proto.c
> > @@ -1288,6 +1288,31 @@ justcleanup:
> >  }
> >  
> >  void
> > +ieee80211_rtm_80211info_task(void *arg)
> > +{
> > +   struct ieee80211com *ic = arg;
> > +   struct ifnet *ifp = >ic_if;
> > +   struct if_ieee80211_data ifie;
> > +   int s = splnet();
> > +
> > +   if (LINK_STATE_IS_UP(ifp->if_link_state)) {
> 
> Does this mean userspace can "miss" state transitions if the task runs
> and the state has already changed back to not-up?
> 
> I don't know whether this would matter in practice, but it would be a
> behavior change.

If this task doesn't find link state up for some reason then it is
running in an unexpected situation. What else should it do?

I think it makes sense for a scheduled task to check that its precondition
still applies. We have had several bugs in iwm(4) where tasks did their
thing even though their reason for being scheduled no longer applied.
In that case we ended up sending firmware commands that appeared our
of order from the device's point of view. I don't know if this really
matters here, it will depend on what userland expects.
 
> Unsure how `route monitor` exercises this path, but I've left it
> running, too.

That was just to see whether the routing message is still being generated.



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-05 Thread Scott Cheloha
On Sat, Dec 04, 2021 at 08:40:42PM +0100, Stefan Sperling wrote:
> On Sat, Dec 04, 2021 at 09:32:40PM +0300, Vitaliy Makkoveev wrote:
> > I think rtm_80211info() could follow the if_link_state_change()
> > way and use task for that.
> 
> Indeed. I did not realize that if_link_state_change() schedules a task.
> 
> This means ieee80211_set_link_state() is already deferring some of its
> work to a task.

Probably should have mentioned this in my original mail.

> The patch below defers sending the 80211info message to a task as well.
> 
> I am keeping things simple for now and use systq (kernel-locked) instead
> of copying the argument-passing approach used by if_link_state_change().

All timeouts run under the kernel lock, so this is appropriate.

> Tested on iwm(4) 8265 with 'route monitor'.
> 
> ok?

I like the direction.  I have a few concerns noted below.

> diff 0b61c8235787960f0010ef627ea5b2c6309a81f0 
> de98c050ea709bdb8e26be40ab0cc82ef9afed80
> blob - 7bb68194dd78417b06c59f81d1ebbff4165203d8
> blob + 5b9a969258074fde29e21a33ac035cf170ec3b03
> --- sys/net80211/ieee80211.c
> +++ sys/net80211/ieee80211.c
> @@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
>   if_addgroup(ifp, "wlan");
>   ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
>  
> + task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
>   ieee80211_set_link_state(ic, LINK_STATE_DOWN);
>  
>   timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
> @@ -203,6 +204,7 @@ ieee80211_ifdetach(struct ifnet *ifp)
>  {
>   struct ieee80211com *ic = (void *)ifp;
>  
> + task_del(systq, >ic_rtm_80211info_task);
>   timeout_del(>ic_bgscan_timeout);

Suppose the ic_bgscan_timeout timeout is running at the moment we're
running ieee80211_ifdetach().  Ignore the kernel lock for the moment,
think about the future.

If we delete the task before we delete the timeout and the timeout
then adds the task back onto the task queue, what happens?

My guess is you need to ensure the timeout is no longer running
*before* you delete the task.  Can you do timeout_del_barrier()
here?  See the attached patch.

>  
>   /*
> blob - 447a2676bfb250b7f917206549679d6ae68de1f6
> blob + 7e10fc1336067542c13d5607602e658ce2b3926b
> --- sys/net80211/ieee80211_proto.c
> +++ sys/net80211/ieee80211_proto.c
> @@ -1288,6 +1288,31 @@ justcleanup:
>  }
>  
>  void
> +ieee80211_rtm_80211info_task(void *arg)
> +{
> + struct ieee80211com *ic = arg;
> + struct ifnet *ifp = >ic_if;
> + struct if_ieee80211_data ifie;
> + int s = splnet();
> +
> + if (LINK_STATE_IS_UP(ifp->if_link_state)) {

Does this mean userspace can "miss" state transitions if the task runs
and the state has already changed back to not-up?

I don't know whether this would matter in practice, but it would be a
behavior change.

--

The rest of the patch looks good, running with it now on this:

$ dmesg | grep iwm0 | tail -n2
iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi
iwm0: hw rev 0x230, fw ver 36.ca7b901d.0, address 98:3b:8f:ef:6b:ef

Unsure how `route monitor` exercises this path, but I've left it
running, too.

Index: ieee80211.c
===
RCS file: /cvs/src/sys/net80211/ieee80211.c,v
retrieving revision 1.85
diff -u -p -r1.85 ieee80211.c
--- ieee80211.c 11 Oct 2021 09:01:06 -  1.85
+++ ieee80211.c 5 Dec 2021 17:01:51 -
@@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
if_addgroup(ifp, "wlan");
ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
 
+   task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
ieee80211_set_link_state(ic, LINK_STATE_DOWN);
 
timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
@@ -203,7 +204,8 @@ ieee80211_ifdetach(struct ifnet *ifp)
 {
struct ieee80211com *ic = (void *)ifp;
 
-   timeout_del(>ic_bgscan_timeout);
+   timeout_del_barrier(>ic_bgscan_timeout);
+   task_del(systq, >ic_rtm_80211info_task);
 
/*
 * Undo pseudo-driver changes. Pseudo-driver detach hooks could



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-05 Thread Florian Obser


reads OK 

On 2021-12-04 20:40 +01, Stefan Sperling  wrote:
> On Sat, Dec 04, 2021 at 09:32:40PM +0300, Vitaliy Makkoveev wrote:
>> I think rtm_80211info() could follow the if_link_state_change()
>> way and use task for that.
>
> Indeed. I did not realize that if_link_state_change() schedules a task.
>
> This means ieee80211_set_link_state() is already deferring some of its
> work to a task. The patch below defers sending the 80211info message
> to a task as well.
>
> I am keeping things simple for now and use systq (kernel-locked) instead
> of copying the argument-passing approach used by if_link_state_change().
>
> Tested on iwm(4) 8265 with 'route monitor'.
>
> ok?
>
> diff 0b61c8235787960f0010ef627ea5b2c6309a81f0 
> de98c050ea709bdb8e26be40ab0cc82ef9afed80
> blob - 7bb68194dd78417b06c59f81d1ebbff4165203d8
> blob + 5b9a969258074fde29e21a33ac035cf170ec3b03
> --- sys/net80211/ieee80211.c
> +++ sys/net80211/ieee80211.c
> @@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
>   if_addgroup(ifp, "wlan");
>   ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
>  
> + task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
>   ieee80211_set_link_state(ic, LINK_STATE_DOWN);
>  
>   timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
> @@ -203,6 +204,7 @@ ieee80211_ifdetach(struct ifnet *ifp)
>  {
>   struct ieee80211com *ic = (void *)ifp;
>  
> + task_del(systq, >ic_rtm_80211info_task);
>   timeout_del(>ic_bgscan_timeout);
>  
>   /*
> blob - 447a2676bfb250b7f917206549679d6ae68de1f6
> blob + 7e10fc1336067542c13d5607602e658ce2b3926b
> --- sys/net80211/ieee80211_proto.c
> +++ sys/net80211/ieee80211_proto.c
> @@ -1288,6 +1288,31 @@ justcleanup:
>  }
>  
>  void
> +ieee80211_rtm_80211info_task(void *arg)
> +{
> + struct ieee80211com *ic = arg;
> + struct ifnet *ifp = >ic_if;
> + struct if_ieee80211_data ifie;
> + int s = splnet();
> +
> + if (LINK_STATE_IS_UP(ifp->if_link_state)) {
> + memset(, 0, sizeof(ifie));
> + ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
> + memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
> + sizeof(ifie.ifie_nwid));
> + memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
> + sizeof(ifie.ifie_addr));
> + ifie.ifie_channel = ieee80211_chan2ieee(ic,
> + ic->ic_bss->ni_chan);
> + ifie.ifie_flags = ic->ic_flags;
> + ifie.ifie_xflags = ic->ic_xflags;
> + rtm_80211info(>ic_if, );
> + }
> +
> + splx(s);
> +}
> +
> +void
>  ieee80211_set_link_state(struct ieee80211com *ic, int nstate)
>  {
>   struct ifnet *ifp = >ic_if;
> @@ -1307,20 +1332,8 @@ ieee80211_set_link_state(struct ieee80211com *ic, int 
>   }
>   if (nstate != ifp->if_link_state) {
>   ifp->if_link_state = nstate;
> - if (LINK_STATE_IS_UP(nstate)) {
> - struct if_ieee80211_data ifie;
> - memset(, 0, sizeof(ifie));
> - ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
> - memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
> - sizeof(ifie.ifie_nwid));
> - memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
> - sizeof(ifie.ifie_addr));
> - ifie.ifie_channel = ieee80211_chan2ieee(ic,
> - ic->ic_bss->ni_chan);
> - ifie.ifie_flags = ic->ic_flags;
> - ifie.ifie_xflags = ic->ic_xflags;
> - rtm_80211info(>ic_if, );
> - }
> + if (LINK_STATE_IS_UP(nstate))
> + task_add(systq, >ic_rtm_80211info_task);
>   if_link_state_change(ifp);
>   }
>  }
> blob - 7208e5dc0be1983a31d7f74142e87581bea95d13
> blob + ce6d1ad91f391cb16c7f0bbfa79210401d5dc7eb
> --- sys/net80211/ieee80211_proto.h
> +++ sys/net80211/ieee80211_proto.h
> @@ -63,6 +63,7 @@ extern  void ieee80211_proto_detach(struct ifnet *);
>  struct ieee80211_node;
>  struct ieee80211_rxinfo;
>  struct ieee80211_rsnparams;
> +extern   void ieee80211_rtm_80211info_task(void *);
>  extern   void ieee80211_set_link_state(struct ieee80211com *, int);
>  extern   u_int ieee80211_get_hdrlen(const struct ieee80211_frame *);
>  extern   int ieee80211_classify(struct ieee80211com *, struct mbuf *);
> blob - dd17ed76031db17bd86cd75a5c1eec659dbd3f30
> blob + 641419331ed3f65fbc9272c055948438b28f1025
> --- sys/net80211/ieee80211_var.h
> +++ sys/net80211/ieee80211_var.h
> @@ -306,6 +306,7 @@ struct ieee80211com {
>   struct timeout  ic_inact_timeout; /* node inactivity timeout */
>   struct timeout  ic_node_cache_timeout;
>  #endif
> + struct task ic_rtm_80211info_task;
>   int ic_des_esslen;
>   u_int8_tic_des_essid[IEEE80211_NWID_LEN];
>   struct ieee80211_channel 

Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Vitaliy Makkoveev
> On 4 Dec 2021, at 22:40, Stefan Sperling  wrote:
> 
> On Sat, Dec 04, 2021 at 09:32:40PM +0300, Vitaliy Makkoveev wrote:
>> I think rtm_80211info() could follow the if_link_state_change()
>> way and use task for that.
> 
> Indeed. I did not realize that if_link_state_change() schedules a task.
> 
> This means ieee80211_set_link_state() is already deferring some of its
> work to a task. The patch below defers sending the 80211info message
> to a task as well.
> 
> I am keeping things simple for now and use systq (kernel-locked) instead
> of copying the argument-passing approach used by if_link_state_change().
> 
> Tested on iwm(4) 8265 with 'route monitor'.
> 
> ok?
> 

Reads ok by mvs@.

> diff 0b61c8235787960f0010ef627ea5b2c6309a81f0 
> de98c050ea709bdb8e26be40ab0cc82ef9afed80
> blob - 7bb68194dd78417b06c59f81d1ebbff4165203d8
> blob + 5b9a969258074fde29e21a33ac035cf170ec3b03
> --- sys/net80211/ieee80211.c
> +++ sys/net80211/ieee80211.c
> @@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
>   if_addgroup(ifp, "wlan");
>   ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
> 
> + task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
>   ieee80211_set_link_state(ic, LINK_STATE_DOWN);
> 
>   timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
> @@ -203,6 +204,7 @@ ieee80211_ifdetach(struct ifnet *ifp)
> {
>   struct ieee80211com *ic = (void *)ifp;
> 
> + task_del(systq, >ic_rtm_80211info_task);
>   timeout_del(>ic_bgscan_timeout);
> 
>   /*
> blob - 447a2676bfb250b7f917206549679d6ae68de1f6
> blob + 7e10fc1336067542c13d5607602e658ce2b3926b
> --- sys/net80211/ieee80211_proto.c
> +++ sys/net80211/ieee80211_proto.c
> @@ -1288,6 +1288,31 @@ justcleanup:
> }
> 
> void
> +ieee80211_rtm_80211info_task(void *arg)
> +{
> + struct ieee80211com *ic = arg;
> + struct ifnet *ifp = >ic_if;
> + struct if_ieee80211_data ifie;
> + int s = splnet();
> +
> + if (LINK_STATE_IS_UP(ifp->if_link_state)) {
> + memset(, 0, sizeof(ifie));
> + ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
> + memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
> + sizeof(ifie.ifie_nwid));
> + memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
> + sizeof(ifie.ifie_addr));
> + ifie.ifie_channel = ieee80211_chan2ieee(ic,
> + ic->ic_bss->ni_chan);
> + ifie.ifie_flags = ic->ic_flags;
> + ifie.ifie_xflags = ic->ic_xflags;
> + rtm_80211info(>ic_if, );
> + }
> +
> + splx(s);
> +}
> +
> +void
> ieee80211_set_link_state(struct ieee80211com *ic, int nstate)
> {
>   struct ifnet *ifp = >ic_if;
> @@ -1307,20 +1332,8 @@ ieee80211_set_link_state(struct ieee80211com *ic, int 
>   }
>   if (nstate != ifp->if_link_state) {
>   ifp->if_link_state = nstate;
> - if (LINK_STATE_IS_UP(nstate)) {
> - struct if_ieee80211_data ifie;
> - memset(, 0, sizeof(ifie));
> - ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
> - memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
> - sizeof(ifie.ifie_nwid));
> - memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
> - sizeof(ifie.ifie_addr));
> - ifie.ifie_channel = ieee80211_chan2ieee(ic,
> - ic->ic_bss->ni_chan);
> - ifie.ifie_flags = ic->ic_flags;
> - ifie.ifie_xflags = ic->ic_xflags;
> - rtm_80211info(>ic_if, );
> - }
> + if (LINK_STATE_IS_UP(nstate))
> + task_add(systq, >ic_rtm_80211info_task);
>   if_link_state_change(ifp);
>   }
> }
> blob - 7208e5dc0be1983a31d7f74142e87581bea95d13
> blob + ce6d1ad91f391cb16c7f0bbfa79210401d5dc7eb
> --- sys/net80211/ieee80211_proto.h
> +++ sys/net80211/ieee80211_proto.h
> @@ -63,6 +63,7 @@ extern  void ieee80211_proto_detach(struct ifnet *);
> struct ieee80211_node;
> struct ieee80211_rxinfo;
> struct ieee80211_rsnparams;
> +extern   void ieee80211_rtm_80211info_task(void *);
> externvoid ieee80211_set_link_state(struct ieee80211com *, int);
> externu_int ieee80211_get_hdrlen(const struct ieee80211_frame *);
> externint ieee80211_classify(struct ieee80211com *, struct mbuf *);
> blob - dd17ed76031db17bd86cd75a5c1eec659dbd3f30
> blob + 641419331ed3f65fbc9272c055948438b28f1025
> --- sys/net80211/ieee80211_var.h
> +++ sys/net80211/ieee80211_var.h
> @@ -306,6 +306,7 @@ struct ieee80211com {
>   struct timeout  ic_inact_timeout; /* node inactivity timeout */
>   struct timeout  ic_node_cache_timeout;
> #endif
> + struct task ic_rtm_80211info_task;
>   int ic_des_esslen;
>   u_int8_tic_des_essid[IEEE80211_NWID_LEN];
>   struct 

Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 09:32:40PM +0300, Vitaliy Makkoveev wrote:
> I think rtm_80211info() could follow the if_link_state_change()
> way and use task for that.

Indeed. I did not realize that if_link_state_change() schedules a task.

This means ieee80211_set_link_state() is already deferring some of its
work to a task. The patch below defers sending the 80211info message
to a task as well.

I am keeping things simple for now and use systq (kernel-locked) instead
of copying the argument-passing approach used by if_link_state_change().

Tested on iwm(4) 8265 with 'route monitor'.

ok?

diff 0b61c8235787960f0010ef627ea5b2c6309a81f0 
de98c050ea709bdb8e26be40ab0cc82ef9afed80
blob - 7bb68194dd78417b06c59f81d1ebbff4165203d8
blob + 5b9a969258074fde29e21a33ac035cf170ec3b03
--- sys/net80211/ieee80211.c
+++ sys/net80211/ieee80211.c
@@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
if_addgroup(ifp, "wlan");
ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
 
+   task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
ieee80211_set_link_state(ic, LINK_STATE_DOWN);
 
timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
@@ -203,6 +204,7 @@ ieee80211_ifdetach(struct ifnet *ifp)
 {
struct ieee80211com *ic = (void *)ifp;
 
+   task_del(systq, >ic_rtm_80211info_task);
timeout_del(>ic_bgscan_timeout);
 
/*
blob - 447a2676bfb250b7f917206549679d6ae68de1f6
blob + 7e10fc1336067542c13d5607602e658ce2b3926b
--- sys/net80211/ieee80211_proto.c
+++ sys/net80211/ieee80211_proto.c
@@ -1288,6 +1288,31 @@ justcleanup:
 }
 
 void
+ieee80211_rtm_80211info_task(void *arg)
+{
+   struct ieee80211com *ic = arg;
+   struct ifnet *ifp = >ic_if;
+   struct if_ieee80211_data ifie;
+   int s = splnet();
+
+   if (LINK_STATE_IS_UP(ifp->if_link_state)) {
+   memset(, 0, sizeof(ifie));
+   ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
+   memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
+   sizeof(ifie.ifie_nwid));
+   memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
+   sizeof(ifie.ifie_addr));
+   ifie.ifie_channel = ieee80211_chan2ieee(ic,
+   ic->ic_bss->ni_chan);
+   ifie.ifie_flags = ic->ic_flags;
+   ifie.ifie_xflags = ic->ic_xflags;
+   rtm_80211info(>ic_if, );
+   }
+
+   splx(s);
+}
+
+void
 ieee80211_set_link_state(struct ieee80211com *ic, int nstate)
 {
struct ifnet *ifp = >ic_if;
@@ -1307,20 +1332,8 @@ ieee80211_set_link_state(struct ieee80211com *ic, int 
}
if (nstate != ifp->if_link_state) {
ifp->if_link_state = nstate;
-   if (LINK_STATE_IS_UP(nstate)) {
-   struct if_ieee80211_data ifie;
-   memset(, 0, sizeof(ifie));
-   ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
-   memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
-   sizeof(ifie.ifie_nwid));
-   memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
-   sizeof(ifie.ifie_addr));
-   ifie.ifie_channel = ieee80211_chan2ieee(ic,
-   ic->ic_bss->ni_chan);
-   ifie.ifie_flags = ic->ic_flags;
-   ifie.ifie_xflags = ic->ic_xflags;
-   rtm_80211info(>ic_if, );
-   }
+   if (LINK_STATE_IS_UP(nstate))
+   task_add(systq, >ic_rtm_80211info_task);
if_link_state_change(ifp);
}
 }
blob - 7208e5dc0be1983a31d7f74142e87581bea95d13
blob + ce6d1ad91f391cb16c7f0bbfa79210401d5dc7eb
--- sys/net80211/ieee80211_proto.h
+++ sys/net80211/ieee80211_proto.h
@@ -63,6 +63,7 @@ externvoid ieee80211_proto_detach(struct ifnet *);
 struct ieee80211_node;
 struct ieee80211_rxinfo;
 struct ieee80211_rsnparams;
+extern void ieee80211_rtm_80211info_task(void *);
 extern void ieee80211_set_link_state(struct ieee80211com *, int);
 extern u_int ieee80211_get_hdrlen(const struct ieee80211_frame *);
 extern int ieee80211_classify(struct ieee80211com *, struct mbuf *);
blob - dd17ed76031db17bd86cd75a5c1eec659dbd3f30
blob + 641419331ed3f65fbc9272c055948438b28f1025
--- sys/net80211/ieee80211_var.h
+++ sys/net80211/ieee80211_var.h
@@ -306,6 +306,7 @@ struct ieee80211com {
struct timeout  ic_inact_timeout; /* node inactivity timeout */
struct timeout  ic_node_cache_timeout;
 #endif
+   struct task ic_rtm_80211info_task;
int ic_des_esslen;
u_int8_tic_des_essid[IEEE80211_NWID_LEN];
struct ieee80211_channel *ic_des_chan;  /* desired channel */




Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 07:19:23PM +0100, Stefan Sperling wrote:
> For this particular case, yes.
> But that won't solve ieee80211_set_link_state() being called from
> interrupt context, would it?

Below is a breakdown of all callers by context.
I grepped for direct callers. My search did not see indirect callers
such as the one from timeout context which we already know about.

I am worried about callers in interrupt context.
It will be hard to "fix" some of these. I would prefer if the routing
message layer could be made to work from interrupt context if possible.

Is there a function we could call to figure out which context we are
running in? If so, ieee80211_set_link_state() could skip sending the
routing message as a workaround. This means some drivers would lose
link state messages unless relevant driver code is migrated to a task.
Which could be done over time on demand, on a per-driver basis.

autoconf context (no userland yet so doesn't matter):
ieee80211_ifattach()

interrupt context (bad):
bwfm_newstate()
pgt_newstate()
ipw_newstate()
iwi_newstate()
iwn_newstate()
wpi_newstate()
atu_newstate()
ieee80211_recv_4way_msg3()
ieee80211_recv_rsn_group_msg1()
ieee80211_recv_wpa_group_msg1()
ieee80211_newstate() <<-- ALL DRIVERS go through here

task context (ok):
iwm_scan()
iwx_scan()
iwx_add_sta_key()
athn_usb_set_key_cb()
otus_set_key_cb()
rsu_set_key_cb()
run_set_key_cb()
urtwn_set_key_cb()



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Vitaliy Makkoveev
> On 4 Dec 2021, at 21:19, Stefan Sperling  wrote:
> 
> On Sat, Dec 04, 2021 at 07:06:35PM +0100, Tobias Heider wrote:
>> On Sat, Dec 04, 2021 at 06:50:54PM +0100, Stefan Sperling wrote:
>>> On Sat, Dec 04, 2021 at 10:37:53AM -0600, Scott Cheloha wrote:
 Hit a witness panic during boot yesterday.  Can't repro, have never
 seen it before.  The photo is a mess (ask if you want it) but the
 backtrace is:
 
 panic
 witness_checkorder
 rw_enter_write
 solock
 route input
 ieee80211_set_link_state
 ieee80211_recv_4way_msg3
 ieee80211_eapol_key_input
 ieee80211_decap
 ieee80211_input_ba_flush
 ieee80211_input_ba_gap_timeout
 timeout_run
 softclock_process_tick_timeout
 sotclock
 
 We're not allowed to take sleeping locks during the execution of a
 timeout, hence the witness panic.
>>> 
>>> I was under the impression that timeouts and tasks are equivalent in this
>>> respect. Do timeouts not use a process context which can use rw locks?
>>> Was this never the case? Or did this change recently?
>>> 
>>> We could schedule a task from the BA gap timeout handler, there is
>>> no problem with doing that.
>>> 
>>> However, there are many callers of ieee80211_set_link_state(), including
>>> drivers. And in particular the WPA handshake will usually be triggered
>>> from interrupt context as frames are received, and call into this function.
>>> 
>>> If ieee80211_set_link_state() now requires a context which can sleep
>>> we should really be checking all its callers for similar issues...
>>> 
>>> Or we stop using a routing message and invent another mechanism that
>>> will work within the current requirements of the wireless stack?
>>> 
>> 
>> I think timeout_set_proc() might be what you are looking for.
> 
> For this particular case, yes.
> But that won't solve ieee80211_set_link_state() being called from
> interrupt context, would it?
> 

I think rtm_80211info() could follow the if_link_state_change()
way and use task for that.



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 07:06:35PM +0100, Tobias Heider wrote:
> On Sat, Dec 04, 2021 at 06:50:54PM +0100, Stefan Sperling wrote:
> > On Sat, Dec 04, 2021 at 10:37:53AM -0600, Scott Cheloha wrote:
> > > Hit a witness panic during boot yesterday.  Can't repro, have never
> > > seen it before.  The photo is a mess (ask if you want it) but the
> > > backtrace is:
> > > 
> > > panic
> > > witness_checkorder
> > > rw_enter_write
> > > solock
> > > route input
> > > ieee80211_set_link_state
> > > ieee80211_recv_4way_msg3
> > > ieee80211_eapol_key_input
> > > ieee80211_decap
> > > ieee80211_input_ba_flush
> > > ieee80211_input_ba_gap_timeout
> > > timeout_run
> > > softclock_process_tick_timeout
> > > sotclock
> > > 
> > > We're not allowed to take sleeping locks during the execution of a
> > > timeout, hence the witness panic.
> > 
> > I was under the impression that timeouts and tasks are equivalent in this
> > respect. Do timeouts not use a process context which can use rw locks?
> > Was this never the case? Or did this change recently?
> > 
> > We could schedule a task from the BA gap timeout handler, there is
> > no problem with doing that.
> > 
> > However, there are many callers of ieee80211_set_link_state(), including
> > drivers. And in particular the WPA handshake will usually be triggered
> > from interrupt context as frames are received, and call into this function.
> > 
> > If ieee80211_set_link_state() now requires a context which can sleep
> > we should really be checking all its callers for similar issues...
> > 
> > Or we stop using a routing message and invent another mechanism that
> > will work within the current requirements of the wireless stack?
> > 
> 
> I think timeout_set_proc() might be what you are looking for.

For this particular case, yes.
But that won't solve ieee80211_set_link_state() being called from
interrupt context, would it?



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Tobias Heider
On Sat, Dec 04, 2021 at 06:50:54PM +0100, Stefan Sperling wrote:
> On Sat, Dec 04, 2021 at 10:37:53AM -0600, Scott Cheloha wrote:
> > Hit a witness panic during boot yesterday.  Can't repro, have never
> > seen it before.  The photo is a mess (ask if you want it) but the
> > backtrace is:
> > 
> > panic
> > witness_checkorder
> > rw_enter_write
> > solock
> > route input
> > ieee80211_set_link_state
> > ieee80211_recv_4way_msg3
> > ieee80211_eapol_key_input
> > ieee80211_decap
> > ieee80211_input_ba_flush
> > ieee80211_input_ba_gap_timeout
> > timeout_run
> > softclock_process_tick_timeout
> > sotclock
> > 
> > We're not allowed to take sleeping locks during the execution of a
> > timeout, hence the witness panic.
> 
> I was under the impression that timeouts and tasks are equivalent in this
> respect. Do timeouts not use a process context which can use rw locks?
> Was this never the case? Or did this change recently?
> 
> We could schedule a task from the BA gap timeout handler, there is
> no problem with doing that.
> 
> However, there are many callers of ieee80211_set_link_state(), including
> drivers. And in particular the WPA handshake will usually be triggered
> from interrupt context as frames are received, and call into this function.
> 
> If ieee80211_set_link_state() now requires a context which can sleep
> we should really be checking all its callers for similar issues...
> 
> Or we stop using a routing message and invent another mechanism that
> will work within the current requirements of the wireless stack?
> 

I think timeout_set_proc() might be what you are looking for.



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 10:37:53AM -0600, Scott Cheloha wrote:
> Hit a witness panic during boot yesterday.  Can't repro, have never
> seen it before.  The photo is a mess (ask if you want it) but the
> backtrace is:
> 
> panic
> witness_checkorder
> rw_enter_write
> solock
> route input
> ieee80211_set_link_state
> ieee80211_recv_4way_msg3
> ieee80211_eapol_key_input
> ieee80211_decap
> ieee80211_input_ba_flush
> ieee80211_input_ba_gap_timeout
> timeout_run
> softclock_process_tick_timeout
> sotclock
> 
> We're not allowed to take sleeping locks during the execution of a
> timeout, hence the witness panic.

I was under the impression that timeouts and tasks are equivalent in this
respect. Do timeouts not use a process context which can use rw locks?
Was this never the case? Or did this change recently?

We could schedule a task from the BA gap timeout handler, there is
no problem with doing that.

However, there are many callers of ieee80211_set_link_state(), including
drivers. And in particular the WPA handshake will usually be triggered
from interrupt context as frames are received, and call into this function.

If ieee80211_set_link_state() now requires a context which can sleep
we should really be checking all its callers for similar issues...

Or we stop using a routing message and invent another mechanism that
will work within the current requirements of the wireless stack?



panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Scott Cheloha
Hit a witness panic during boot yesterday.  Can't repro, have never
seen it before.  The photo is a mess (ask if you want it) but the
backtrace is:

panic
witness_checkorder
rw_enter_write
solock
route input
ieee80211_set_link_state
ieee80211_recv_4way_msg3
ieee80211_eapol_key_input
ieee80211_decap
ieee80211_input_ba_flush
ieee80211_input_ba_gap_timeout
timeout_run
softclock_process_tick_timeout
sotclock

We're not allowed to take sleeping locks during the execution of a
timeout, hence the witness panic.

The backtrace omits that ieee80211_set_link_state() calls
rtm_80211info(), which calls route_input().

The guilty call is in ieee80211_proto.c, line 1327:

  1295  void
  1296  ieee80211_set_link_state(struct ieee80211com *ic, int nstate)
  1297  {
  1298  struct ifnet *ifp = >ic_if;
  1299  
  1300  switch (ic->ic_opmode) {

/* ... */

  1312  }
  1313  if (nstate != ifp->if_link_state) {
  1314  ifp->if_link_state = nstate;
  1315  if (LINK_STATE_IS_UP(nstate)) {
  1316  struct if_ieee80211_data ifie;
  1317  memset(, 0, sizeof(ifie));
  1318  ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
  1319  memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
  1320  sizeof(ifie.ifie_nwid));
  1321  memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
  1322  sizeof(ifie.ifie_addr));
  1323  ifie.ifie_channel = ieee80211_chan2ieee(ic,
  1324  ic->ic_bss->ni_chan);
  1325  ifie.ifie_flags = ic->ic_flags;
  1326  ifie.ifie_xflags = ic->ic_xflags;
  1327  rtm_80211info(>ic_if, );
  1328  }
  1329  if_link_state_change(ifp);
  1330  }
  1331  }

So if the link state is changing from not-up to up we send a routing
info message to userspace.

Typical solution would be to postpone the rtm_80211info() call to a
task, but I'm not familiar with this stack so maybe we can't do that?