Anyone who is willing to assist with this?
Kind regards,
Michel Stam
--- Begin Message ---
Dear list,
I've been trying to tackle an annoying bug I've been having on a mesh of 2
units with AuthSAE enabled for the past two weeks, but I cannot seem to find
what causes it.
It happens when I run an iperf test between the units. At or around the time
the SAE lifetime expires, a rekey occurs, after which traffic between the units
stops. Wireshark/tcpdump do seem to indicate incoming traffic when observing
through a monitor interface.
Sometimes a packets arrive again about a key lifetime later. This does not give
a very stable mesh though, as normally the rekey lifetime is 3600 seconds
(which means the link is effectively down for an hour).
When the problem occurs, I've observed the "iw dev mesh0 station dump" command
returning quickly increasing counters for "rx drop misc" (
NL80211_STA_INFO_RX_DROP_MISC). I can't be certain if it is related, the value
also increases on a working link, although it seems slower.
If I leave the link idle (no iperf test, just some pings), then this problem
does not seem to occur. This makes me believe it is a race of sorts.
Looking at the debug traces from meshd-nl80211I can find no fault. I also
looked at the key material sent down to the ath9k driver (printk's in the
kernel driver), but even reading back those registers does not indicate to me
that there's a fault. I read back what was written.
Both units use an ath9k Atheros card; One is an AzureWave AR5B95, the other is
a Compex WLE200N2-23. I have also observed the problem on Compex WLE350NX
cards, so I am guessing this is not hardware related.
I set up both units with this configuration:
meshd.txt<https://github.com/cozybit/authsae/files/330064/meshd.txt>. I'm using
the latest GIT from AuthSAE.
The kernel I use 4.4.11, but I've seen the same problem with 3.10.49.
The compat-wireless 2016-01-10 driver set used by OpenWRT seems to have the
same problem with the old 3.10.34 kernel I run on that system.
The iperf setup is (using 2.0.5):
* One system running iperf -s -u -p 6969 -i 5
* One system running iperf -c -u -p 6969 -i 5 -t 86400 -b 100M
I create the mesh interfaces by:
* iw phy phy0 interface add mesh0 type mp
* ifconfig mesh0 IP MASK up
* meshd-nl80211 -c meshd.txt -i mesh0
Right now the key lifetime is at 60 seconds for problem reproduction, but I
have seen the same problem on a link with a key lifetime of 3600 seconds; the
link then dies at that time.
Loading ath9k with nohwcrypt=1 solves the problem, but costs more CPU cycles.
Now I've made a patch which calls ath9k_queue_reset every time the key is set.
This seems to get rid of the link dying on me, at the cost of a lot of
authentication traffic. This is a very heavy-handed approach, and I'm fairly
certain this is not gonna work in a production environment. See here for the
ugly hack:
https://github.com/cozybit/authsae/files/347910/ath9k-install_key-buckshot.diff.txt.
This issue has also been posted as: https://github.com/cozybit/authsae/issues/42
Someone on the AuthSAE github page mentioned that this is apparently this is a
long-standing issue with the driver, which was submitted before as
https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html.
Is anyone able to assist me/ give me a couple of pointers?
Regards,
Michel Stam
_______________________________________________
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
--- End Message ---
_______________________________________________
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel