[POMdev wrote: > ... This "toxic beacon" theory fits with my experiences investigating > the radio's failure. It explains how the radio, even when solidly > connected to a compatible access point, fails occasionally when exposed > to much weaker signals from neighbor's newer 2019-2020+ systems. I > suspect that the very frequent scanning performed by the radio is > picking up these signals, and, once in a while, crashing something. The > effect is that the radio loses receive sensitivity, misses packets from > the access point, and eventually disconnects. ... And it happens > "rarely" considering the radio is so frequently scanning, which was > evidently increased in a patch to the driver source code, perhaps to > improve performance when moving the radio between rooms or access > points. ... > The research and writing for Tuesday's post led to a thought: What if the system could drastically reduce the other access point scanning so that the so-called rare fault was much less likely to occur? The driver source code was evidently patched to increase the scan frequency. Might there be a way to decrease it using the software we already have?
The answer may be yes. The tersely documented command /lib/atheros/wmiconfig --scan --scanctrlflags 0 0 0 0 0 0 controls scanning under some circumstances. Trying this on 5 radios resulted in a dramatic reduction of lengthy gaps and resets, very encouraging. A function to do this was added to an experimental wlanpoke script and deployed on 7 radios. It is executed by the Quick and Full reset functions, and only if they are triggered by a failure. This command is poorly understood. Also, there is likely a very good reason the unmodified software scans as frequently as it does. Inhibiting scanning may cause problems with portable and mobile use. For this reason, the scan reduction is done only when the connection has been lost. For fixed operation with interfering signals, this seems helpful. When moving out of range of one access point into another area, this might delay or prevent connection to another access point. If a connection cannot be made, then a reboot will be required to restore the radio's original scanning behavior. There were mixed results after a day. The radio causing the most complaints has had only 1 reset, 20 gaps 4 second gaps, and 3 wpa_cli script restarts, 3 radios had no resets, and a similar number of 4 second gaps. The south perimeter radio, wired to the Ethernet, was switched over to a strong WLAN (-62). It was reverted to Ethernet after (Qr:35 Fr:10 Wr:6 Wc:2) within a few hours. To continue the investigation into this location, the radio in the most shielded underground position, typically with no resets, was set up adjacent to this unit, and is now being similarly hammered. The second story radio above the "most complaints" unit is also suffering. The latest quick reset method (power cycle) is not very effective. It evidently does not repair the damage caused by the interference even as it helps the radio to reconnect, but mostly only for a few seconds or so, and is no longer the default. After 3 days the improvement is quite good. The worst location was instrumented with a wireless telnet to serial ESP32 module so that the serial console would be available during the frequent outages. However, in this location the wireless serial module also repeatedly fails, a new phenomenon. The interference is evidently fierce! But all the other radios have more or less calmed down and are working normally, an encouraging result. A new experimental version 0.8.7.1a has been uploaded to the 'wlanpoke GitHub Development branch' (https://github.com/PomDev2/wlanpoke/tree/development). It has been tested so far only with fixed locations. This version requires testing and possible refinement to avoid degrading the radio's various other use cases. By the way, this version also supports environments containing access points with apostrophes and other special characters that previously caused the script to fail. Also by the way, a debug version of the frequently failing wpa_cli program could facilitate troubleshooting and fixing that point of failure. ------------------------------------------------------------------------ POMdev's Profile: http://forums.slimdevices.com/member.php?userid=70558 View this thread: http://forums.slimdevices.com/showthread.php?t=111663 _______________________________________________ Radio mailing list Radio@lists.slimdevices.com http://lists.slimdevices.com/mailman/listinfo/radio