Unlikely recoverable failures

Christopher Collins Thu, 18 Oct 2018 11:12:02 -0700

Hello all,

I think Mynewt lacks a mechanism for dealing with a certain class of
errors: unlikely recoverable failures.  For the purposes of this email,
I divide failures into two groups:


    1. Failures we cannot recover from, or that we don't want to recover
       from.

This group includes: immediate bugs in the code, memory corruption,
software watchdog expiry, etc.

The best way to handle these failures is to crash immediately, causing
gdb to hit a breakpoint and / or creating a core dump.  We already have
a tool for this: assert().

    2. Failures we want to recover from in production, but not always
       during development.

This group includes: failure of nonessential hardware, dynamic memory
allocation failures, file system write failures due to insufficient
space, etc.

For this second group, I think we need a "crash()" macro that can be
enabled or disabled at compile time[1].  During development and debugging,
the macro would crash the system.  In production builds, the macro would
expand to nothing, allowing the recovery code to execute.

I would like to implement such a macro in mynewt-core.  Here are some of
my thoughts on the subject.  As always, please don't hesitate to express
any criticism.

A. No conditional.

Unlike `assert()`, this new macro should not evaluate an expression to
determine success or failure.  Instead, the calling code should detect
failure with an `if` statement or similar, and only invoke the macro in
the failure case.  Since these failures are expected, I think it is
likely that the application will always want to execute some code in the
failure case, regardless of whether the macro is configured to trigger a
crash.  For example: logging a message to the console when the failure
is detected.  If the macro itself detects the failure, the application
doesn't have the opportunity to execute any code before the crash.

B. No severity.

Initially, I was thinking we could have a severity associated with each
failure.  The new macro would only trigger a crash if invoked with a
severity less than or equal to some `PANIC_LEVEL` setting.  However, I
decided that this just complicates the feature without adding any real
value.  I can't think of a good way to assign a severity to each
failure.  If we ever need this, we can add it on top of the
single-severity implementation.

I do think the ability to enable this feature per-package would be
useful, so that is something to consider for the future.

C. Name?

This is what has been causing me the most agony: what to name the macro.
The names I hate the least are:

    * DBG_PANIC() // "Debug panic"
    * DEV_PANIC() // "Development panic"

But I am not crazy about these.  Any other suggestions?  Even though
all-caps is ugly, I do think it should be used here to make it obvious
that this construct does very macro-like things (inspects the file and
line number, possibly expands to nothing, etc.).

Thanks,
Chris

[1] `assert()` can be enabled and disabled via the `NDEBUG` macro.
However, I am of the opinion that asserts are too useful to ever
disable, so I would like an additional level of configurability here.

Unlikely recoverable failures

Reply via email to