jc77 wrote: 
> 
> Any suggested changes to script to make keep this one from going
> completely offline every 12 hours or so? [Back1] ...
> This Radio needs to be restarted after 2-3 days  [Kitchen] ...
> These two stay online for about a week before needing to be restarted:
> [Floater &  Back2] ...
> 
Wow, this is a lot. Thanks for the report. This is the first report of a
radio requiring a reboot to reconnect since a quick reset related bug
caused the script to drop out last month, and you have 4 of them!. There
is something happening that is out of the ordinary in your environment.
Here, the only outages have been from hardware (power supply connector)
failures.

To troubleshoot this, use the ::-d /etc/log/:: to save the log files
where they will not be lost after a reboot, and examine the files as
soon as you notice the radio off. Generally, the script will keep trying
to reconnect forever, but it limits each trial to a specified amount of
time before trying again. If it tries again too soon, a possible
recovery is aborted, a new cycle begins, and the radio does not recover.
This was a problem with weak signals and heavy interference in earlier
versions, where the retry time was fixed at a relatively short time.
Later versions like you are using have an adaptive retry time that has
an upper limit. You might be hitting that upper limit. On the other
hand, something else may be happening (e.g., the script may be exiting),
so looking at the logs after a reboot is important. You can zip them and
upload them into a new GitHub issue.

None of your radios have any Options showing. The 'slow' web server is
recommended for monitoring. Change launch in /etc/init.d/rcS.local to:
::    /etc/wlanpoke/wlanpoke.sh -W slow -d /etc/log/ &:: 
and check out http://Back1:8080     (use Back1's IP address if needed). 
Also, ncat is handy. If you are not using it, disable tcp logging with
the ::-x:: option.

Of your radios, only "Floater" is showing extended full reset recovery
times (possibly from going out of range of the access point) and it
stays on for a week. All the others recover right away.
You can increase this time by editing the script (no command line option
yet) variables FRWaitSecsMax and FRWaitStepPct, increasing one or both.

Your 'Back1' radio reports a signal level of -70 dBm. It may be just out
of range of the AP given your environment. "Floater" had a very high
signal level at the time of measurement, but perhaps that unit had also
been in the hinterlands for a while, which would explain the repeated
failure of the full resets to work, as Floater recovered when brought
into a strong signal area. (Does Back1 recover without rebooting when
moved into a high signal area?)

Back1 is failing the most often, so it makes sense to concentrate on
this unit first, and perhaps Floater as well, as it exhibits the
interesting increasing recovery time steps, which might also being
experienced by Back1, but the logs are lost.

It would also be useful for you to track the radios as a whole to see
about time of day phenomena. The excellent free Wireless Network Watcher
and NetworkConnectLog from Nir Software on your Windows pc will capture
the status of the devices on your LAN. This would give you an important
"last seen" time to correlate with your log files. And you might try the
latest software in the development branch to see if a quick reset is
effective in your environment, and to improve the Reset report. Still,
your 0.8.4.1 should work as is.


------------------------------------------------------------------------
POMdev's Profile: http://forums.slimdevices.com/member.php?userid=70558
View this thread: http://forums.slimdevices.com/showthread.php?t=111663

_______________________________________________
Radio mailing list
Radio@lists.slimdevices.com
http://lists.slimdevices.com/mailman/listinfo/radio

Reply via email to