Hello all,

I have been looking into implementing a graceful shutdown for Mynewt.
The system may want to perform a cleanup procedure immediately before it
resets, and I wanted to allow this procedure to be configured.  I am
calling this shutdown facility "sysdown", as a counterpart to "sysinit".

### BASIC IDEA:

My idea is to allow each Mynewt package to specify a sequence of
shutdown function calls, similar to a package's `pkg.init` function call
list.  The newt tool would generate a C shutdown function called
`sysdown()`.  This function would consist of calls to each package's
shutdown functions.  When a controlled shutdown is initiated,
`sysdown()` would be called prior to the final call to
`hal_system_reset()`.

To clarify, this procedure would only be performed for a controlled
shutdown.  It would be executed when the system processes a newtmgr
"reset" command, for example.  It would not be executed when the system
crashes, browns out, or restarts due to the hardware watchdog.

I think this scheme is pretty straightforward and I see no issues so far
(but please pipe up if this doesn't seem right!).

### PROBLEM:

Then I tried applying this to an actual use case, and of course I
immediately encountered some problems :).

My actual use case is this: when I reset the Mynewt device, I would like
the nimble stack to gracefully terminate all open Bluetooth connections.
This isn't strictly necessary; the connected peer will eventually
realize that the connection has dropped some time after the reset.  The
problem is that Android centrals take a really long time to realize that
the connection has dropped, as described here:
https://blog.classycode.com/a-short-story-about-android-ble-connection-timeouts-and-gatt-internal-errors-fa89e3f6a456.
So, I wanted to explicitly terminate the connections to speed up the
process.

Ideally, I could configure the nimble host package with a shutdown
callback that just performs a blocking terminate of each open
connection in sequence.  Unfortunately, the nimble host is likely
running in the same task as the one that initiated the shutdown, so it
is not possible to perform a blocking operation.  Instead, the running
task needs to terminate each connection asynchronously: enqueue a GAP
terminate procedure, then return so that the task can process its event
queue.  Eventually, the BLE terminate procedure will complete, and the
result of the procedure will be indicated via an event on this event
queue.  The sysdown mechanism I described earlier in this email doesn't
allow for asynchronous procedures.  It reboots the system immediately
after executing all the shutdown callbacks.

I think this will be a common issue for other packages, so I am
trying to come up with a general solution.

### SOLUTION:

Each shutdown callback returns one of the following codes:
    * SYSDOWN_COMPLETE
    * SYSDOWN_IN_PROGRESS

When a controlled reset is initiated, the shutdown facility executes
every confgured callback.  If all callbacks return `SYSDOWN_COMPLETE`,
then the procedure is done; the system completes the reset with a call
to `hal_system_reset()`.

If one or more callbacks returns `SYSDOWN_IN_PROGRESS`, then the
shutdown facility needs to wait for those subprocedures to complete.
Each in-progress shutdown subprocedure indicates completion
asynchronously via a call to `sysdown_complete()`.  When the last
remaining entry signifies completion, the shutdown facility finishes the
shutdown procedure with a call to `hal_system_reset()`.  To defend
against hanging subprocedures, the system can be configured to crash if
the shutdown procedure takes too long.

Does that sound reasonable?  All comments welcome.

Thanks,
Chris

Reply via email to