On Thu, Sep 29, 2016 at 12:09:43PM +0300, Lauri Tirkkonen wrote:
> On Thu, Sep 29 2016 11:04:09 +0200, Stefan Sperling wrote:
> > On Thu, Sep 29, 2016 at 07:18:09AM +0000, Lauri Tirkkonen wrote:
> > > >Synopsis:        panic in ieee80211_node_leave_11g: bogus long slot 
> > > >station count 0
> > > >Category:        kernel
> > > >Environment:
> > >   System      : OpenBSD 6.0
> > >   Details     : OpenBSD 6.0-current (GENERIC.MP) #2463: Sat Sep 17 
> > > 09:52:10 MDT 2016
> > >                    
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine     : amd64
> > > >Description:
> > > 
> > > I have had three panics with similar stacks with this kernel in my router
> > > (Soekris net6501), a few days apart from each other (Sep23, 26 and 28). I 
> > > see 
> > > that there have been further changes to sys/net80211, so I will try to 
> > > reproduce
> > > with a more recent snapshot, but since I don't have a good way to 
> > > actually repro
> > > this and the diffs don't seem to me to be directly related, I'm reporting 
> > > this
> > > now.
> > 
> > Please show your /etc/hostname.athn0 file.
> 
> nwid airman chan 2 wpaciphers ccmp wpaprotos wpa2 wpagroupcipher ccmp
> wpakey <omitted>
> media autoselect mode 11g mediaopt hostap
> chan 12
> up
> inet 192.168.33.1/24
> inet6 up

OK, you're not doing anything insane.

These panics are probably fallout from this commit:

-----
CVSROOT:        /cvs
Module name:    src
Changes by:     [email protected]    2016/05/18 02:15:28

Modified files:
        sys/net80211   : ieee80211_input.c ieee80211_node.c 
                         ieee80211_proto.c 

Log message:
In hostap mode, don't re-use association IDs (AIDs) of nodes which are
still lingering in the node cache. This could cause an AID to be assigned
twice, once to a newly associated node and once to a different node in
COLLECT cache state (i.e. marked for future eviction from the node cache).

Drivers (e.g. rt2860) may use AIDs to keep track of nodes in firmware
tables and get confused when AIDs aren't unique across the node cache.
The symptom observed with rt2860 were nodes stuck at 1 Mbps Tx rate since
the duplicate AID made the driver perform Tx rate (AMRR) accounting on
the wrong node object.

To find out if a node is associated we now check the node's cache state,
rather than comparing the node's AID against zero. An AID is assigned when
a node associates and it lasts until the node is eventually purged from the
node cache (previously, the AID was made available for re-use when the node
was placed in COLLECT state). There is no need to be stingy with AIDs since
the number of possible AIDs exceeds the maximum number of nodes in the cache.

Problem found by Nathanael Rensen.
Fix written by Nathanael and myself. Tested by Nathanael.
Comitting now to get this change tested across as many drivers as possible.
-----

You've found another code path where a check against AID zero is
used to determine whether a node is in associated state. Tsk tsk.

Does this fix it?

Index: ieee80211_node.c
===================================================================
RCS file: /cvs/src/sys/net80211/ieee80211_node.c,v
retrieving revision 1.105
diff -u -p -r1.105 ieee80211_node.c
--- ieee80211_node.c    15 Sep 2016 03:32:48 -0000      1.105
+++ ieee80211_node.c    29 Sep 2016 09:20:53 -0000
@@ -1678,11 +1678,14 @@ ieee80211_node_leave(struct ieee80211com
 {
        if (ic->ic_opmode != IEEE80211_M_HOSTAP)
                panic("not in ap mode, mode %u", ic->ic_opmode);
+
+       if (ni->ni_state == IEEE80211_STA_COLLECT)
+               return;
        /*
         * If node wasn't previously associated all we need to do is
         * reclaim the reference.
         */
-       if (ni->ni_associd == 0) {
+       if (ni->ni_associd == 0 || ni->ni_state < IEEE80211_STA_ASSOC) {
                ieee80211_node_newstate(ni, IEEE80211_STA_COLLECT);
                return;
        }

Reply via email to