Hi Oleksij,

On Thu, 2018-03-08 at 15:16 +0100, Oleksij Rempel wrote:
> > Also, it should be documented explicitly, that this will cause barebox
> > to keep triggering the watchdog, even when it drops to the shell after
> > a boot error. This makes it unsuitable for unattended use.
> 
> I would prefer to use controlled reboot over uncontrolled watchdog reset.
> For example it would be better to have boot and fail strategy. In case
> of network boot, it would be better to retry download in some time and
> not cause watchdog reset. If retry count exceeded then some thing should
> be done. It can be power off, reboot, fall back to CLI.

In my experience, the watchdog is used as a last resort to handle any
*unanticipated* problems. So, by definition, there isn't any code to
handle these problems. The way to do this is that the watchdog is only
triggered when the boot process has made actual progress towards a
running system. For example:
- once barebox probes the watchdog driver
- from the shell init scripts
- after loading the kernel, just before jumping to the kernel

This way, there is no possible way which could cause barebox to just
wait on the prompt: an idle or hung system will always be restarted via
the watchdog.

> The reason for controlled reboot is the fact that the reset impact (or
> Reset Sensitivity) is different for every product and source of reset.
> 
> This example is take from MiniRISC EZ4021-FC documentation:
>                               Soft                            TAP Ctrl
> Module                Reset   Reset   PrRst   ERst    TRST    Reset
> CPU                   yes     yes     yes     no      no      no
> CP0                   yes     yes     yes     no      no      no
> ICCi                  yes     yes     yes     no      no      no
> DCC                   yes     yes     yes     no      no      no
> BIU                   yes     yes     yes     no      no      no
> MMU                   yes     no      no      no      no      no
> MDU                   yes     yes     yes     no      no      no
> EJTAG iface:
> - DMA/CPU Acc         yes     yes     yes     yes     yes     yes
>   logic       
> - Protocol engine     yes     no      no      yes     yes     yes
> - Breakpoint          yes     no      no      yes     no      no
> - PC trace yes no no yes no no

It is not clear to me from this table which reset is triggered by the
hardware watchdog. I would expect that it is the first column, which
resets everything.

> Most Atheros/QCA WiSoCs will not reset complete SoC even with watchdog
> triggered reset.

If you can't be sure that the watchdog resets enough to recover from
any transient problem, you cannot rely on it at all (and should
possibly use an external watchdog).

Regards,
Jan
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox

Reply via email to