Anyone who is willing to assist with this?

Kind regards,

Michel Stam

--- Begin Message ---
Dear list,

I've been trying to tackle an annoying bug I've been having on a mesh of 2 
units with AuthSAE enabled for the past two weeks, but I cannot seem to find 
what causes it.

It happens when I run an iperf test between the units. At or around the time 
the SAE lifetime expires, a rekey occurs, after which traffic between the units 
stops. Wireshark/tcpdump do seem to indicate incoming traffic when observing 
through a monitor interface.
Sometimes a packets arrive again about a key lifetime later. This does not give 
a very stable mesh though, as normally the rekey lifetime is 3600 seconds 
(which means the link is effectively down for an hour).

When the problem occurs, I've observed the "iw dev mesh0 station dump" command 
returning quickly increasing counters for "rx drop misc" ( 
NL80211_STA_INFO_RX_DROP_MISC). I can't be certain if it is related, the value 
also increases on a working link, although it seems slower.

If I leave the link idle (no iperf test, just some pings), then this problem 
does not seem to occur. This makes me believe it is a race of sorts.

Looking at the debug traces from meshd-nl80211I can find no fault. I also 
looked at the key material sent down to the ath9k driver (printk's in the 
kernel driver), but even reading back those registers does not indicate to me 
that there's a fault. I read back what was written.

Both units use an ath9k Atheros card; One is an AzureWave AR5B95, the other is 
a Compex WLE200N2-23. I have also observed the problem on Compex WLE350NX 
cards, so I am guessing this is not hardware related.

I set up both units with this configuration: 
meshd.txt<https://github.com/cozybit/authsae/files/330064/meshd.txt>. I'm using 
the latest GIT from AuthSAE.

The kernel I use 4.4.11, but I've seen the same problem with 3.10.49.
The compat-wireless 2016-01-10 driver set used by OpenWRT seems to have the 
same problem with the old 3.10.34 kernel I run on that system.

The iperf setup is (using 2.0.5):

  *   One system running iperf -s -u -p 6969 -i 5
  *   One system running iperf -c -u -p 6969 -i 5 -t 86400 -b 100M

I create the mesh interfaces by:

  *   iw phy phy0 interface add mesh0 type mp
  *   ifconfig mesh0 IP MASK up
  *   meshd-nl80211 -c meshd.txt -i mesh0

Right now the key lifetime is at 60 seconds for problem reproduction, but I 
have seen the same problem on a link with a key lifetime of 3600 seconds; the 
link then dies at that time.

Loading ath9k with nohwcrypt=1 solves the problem, but costs more CPU cycles.

Now I've made a patch which calls ath9k_queue_reset every time the key is set. 
This seems to get rid of the link dying on me, at the cost of a lot of 
authentication traffic. This is a very heavy-handed approach, and I'm fairly 
certain this is not gonna work in a production environment. See here for the 
ugly hack: 
https://github.com/cozybit/authsae/files/347910/ath9k-install_key-buckshot.diff.txt.

This issue has also been posted as: https://github.com/cozybit/authsae/issues/42

Someone on the AuthSAE github page mentioned that this is apparently this is a 
long-standing issue with the driver, which was submitted before as 
https://www.mail-archive.com/ath9k-devel%40lists.ath9k.org/msg13595.html.

Is anyone able to assist me/ give me a couple of pointers?

Regards,

Michel Stam

_______________________________________________
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel

--- End Message ---
_______________________________________________
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel

Reply via email to