[POMdev wrote: 
> ... This "toxic beacon" theory fits with my experiences investigating
> the radio's failure. It explains how the radio, even when solidly
> connected to a compatible access point, fails occasionally when exposed
> to much weaker signals from neighbor's newer 2019-2020+ systems. I
> suspect that the very frequent scanning performed by the radio is
> picking up these signals, and, once in a while, crashing something. The
> effect is that the radio loses receive sensitivity, misses packets from
> the access point, and eventually disconnects. ... And it happens
> "rarely" considering the radio is so frequently scanning, which was
> evidently increased in a patch to the driver source code, perhaps to
> improve performance when moving the radio between rooms or access
> points. ...
> 
The research and writing for Tuesday's post led to a thought: What if
the system could drastically reduce the other access point scanning so
that the so-called rare fault was much less likely to occur? The driver
source code was evidently patched to increase the scan frequency. Might
there be a way to decrease it using the software we already have?

The answer may be yes. The tersely documented command
/lib/atheros/wmiconfig --scan --scanctrlflags 0 0 0 0 0 0 controls
scanning under some circumstances. Trying this on 5 radios resulted in a
dramatic reduction of lengthy gaps and resets, very encouraging. A
function to do this was added to an experimental wlanpoke script and
deployed on 7 radios. It is executed by the Quick and Full reset
functions, and only if they are triggered by a failure. 

This command is poorly understood. Also, there is likely a very good
reason the unmodified software scans as frequently as it does.
Inhibiting scanning may cause problems with portable and mobile use. For
this reason, the scan reduction is done only when the connection has
been lost. For fixed operation with interfering signals, this seems
helpful. When moving out of range of one access point into another area,
this might delay or prevent connection to another access point. If a
connection cannot be made, then a reboot will be required to restore the
radio's original scanning behavior.

There were mixed results after a day. The radio causing the most
complaints has had only 1 reset, 20 gaps 4 second gaps, and 3 wpa_cli
script restarts, 3 radios had no resets, and a similar number of 4
second gaps. The south perimeter radio, wired to the Ethernet, was
switched over to a strong WLAN (-62). It was reverted to Ethernet after
(Qr:35 Fr:10  Wr:6 Wc:2) within a few hours. To continue the
investigation into this location, the radio in the most shielded
underground position, typically with no resets, was set up adjacent to
this unit, and is now being similarly hammered. The second story radio
above the "most complaints" unit is also suffering. The latest quick
reset method (power cycle) is not very effective. It evidently does not
repair the damage caused by the interference even as it helps the radio
to reconnect, but mostly only for a few seconds or so, and is no longer
the default.

After 3 days the improvement is quite good. The worst location was
instrumented with a wireless telnet to serial ESP32 module so that the
serial console would be available during the frequent outages. However,
in this location the wireless serial module also repeatedly fails, a new
phenomenon. The interference is evidently fierce! But all the other
radios have more or less calmed down and are working normally, an
encouraging result.

A new experimental version 0.8.7.1a has been uploaded to the 'wlanpoke
GitHub Development branch'
(https://github.com/PomDev2/wlanpoke/tree/development). It has been
tested so far only with fixed locations. This version requires testing
and possible refinement to avoid degrading the radio's various other use
cases.

By the way, this version also supports environments containing access
points with apostrophes and other special characters that previously
caused the script to fail. Also by the way, a debug version of the
frequently failing wpa_cli program could facilitate troubleshooting and
fixing that point of failure.


------------------------------------------------------------------------
POMdev's Profile: http://forums.slimdevices.com/member.php?userid=70558
View this thread: http://forums.slimdevices.com/showthread.php?t=111663

_______________________________________________
Radio mailing list
Radio@lists.slimdevices.com
http://lists.slimdevices.com/mailman/listinfo/radio

Reply via email to