Re: Controlled shutdown

Christopher Collins Fri, 28 Sep 2018 16:37:33 -0700

On Fri, Sep 28, 2018 at 04:18:08PM -0700, will sanfilippo wrote:
> Some comments:
> 
> 1) Are there are any other cases you can see for a controlled shutdown? I get 
> the reset command. I am trying to think of others.


I think the newtmgr reset command is the main use case (as well as the
shell "reset" command).  It seems plausible that a device would want to
reset itself for other reasons, but I can't think of an actual use case!

> 2) I am curious: how do you know how many of these functions, or which ones, 
> return in progress? Curious to know how you were going to implement that 
> specifically.

I was thinking the sysdown module would maintain a counter representing
the number of in-progress shutdown subprocedures.  When a subprocedure
completes, it calls `sysdown_complete()`, which decrements the counter.
When the counter reaches 0, the system restarts.

There is something else I should mention.  I now think it is a bad idea
to use return codes to indicate whether the subprocedure is complete.
Each subprocedure is called by generated code, and branching on return
codes makes the generated code too complicated in my opinion.  I think
it is better to add a new function, `sysdown_in_progress()`.  If a
subprocedure needs to continue beyond the initial callback, it calls
this new function before returning.

Chris

> Otherwise, sounds good and seems like a good addition to mynewt!
> 
> 
> > On Sep 28, 2018, at 1:08 PM, Christopher Collins <ch...@runtime.io> wrote:
> > 
> > Hello all,
> > 
> > I have been looking into implementing a graceful shutdown for Mynewt.
> > The system may want to perform a cleanup procedure immediately before it
> > resets, and I wanted to allow this procedure to be configured.  I am
> > calling this shutdown facility "sysdown", as a counterpart to "sysinit".
> > 
> > ### BASIC IDEA:
> > 
> > My idea is to allow each Mynewt package to specify a sequence of
> > shutdown function calls, similar to a package's `pkg.init` function call
> > list.  The newt tool would generate a C shutdown function called
> > `sysdown()`.  This function would consist of calls to each package's
> > shutdown functions.  When a controlled shutdown is initiated,
> > `sysdown()` would be called prior to the final call to
> > `hal_system_reset()`.
> > 
> > To clarify, this procedure would only be performed for a controlled
> > shutdown.  It would be executed when the system processes a newtmgr
> > "reset" command, for example.  It would not be executed when the system
> > crashes, browns out, or restarts due to the hardware watchdog.
> > 
> > I think this scheme is pretty straightforward and I see no issues so far
> > (but please pipe up if this doesn't seem right!).
> > 
> > ### PROBLEM:
> > 
> > Then I tried applying this to an actual use case, and of course I
> > immediately encountered some problems :).
> > 
> > My actual use case is this: when I reset the Mynewt device, I would like
> > the nimble stack to gracefully terminate all open Bluetooth connections.
> > This isn't strictly necessary; the connected peer will eventually
> > realize that the connection has dropped some time after the reset.  The
> > problem is that Android centrals take a really long time to realize that
> > the connection has dropped, as described here:
> > https://blog.classycode.com/a-short-story-about-android-ble-connection-timeouts-and-gatt-internal-errors-fa89e3f6a456.
> > So, I wanted to explicitly terminate the connections to speed up the
> > process.
> > 
> > Ideally, I could configure the nimble host package with a shutdown
> > callback that just performs a blocking terminate of each open
> > connection in sequence.  Unfortunately, the nimble host is likely
> > running in the same task as the one that initiated the shutdown, so it
> > is not possible to perform a blocking operation.  Instead, the running
> > task needs to terminate each connection asynchronously: enqueue a GAP
> > terminate procedure, then return so that the task can process its event
> > queue.  Eventually, the BLE terminate procedure will complete, and the
> > result of the procedure will be indicated via an event on this event
> > queue.  The sysdown mechanism I described earlier in this email doesn't
> > allow for asynchronous procedures.  It reboots the system immediately
> > after executing all the shutdown callbacks.
> > 
> > I think this will be a common issue for other packages, so I am
> > trying to come up with a general solution.
> > 
> > ### SOLUTION:
> > 
> > Each shutdown callback returns one of the following codes:
> >    * SYSDOWN_COMPLETE
> >    * SYSDOWN_IN_PROGRESS
> > 
> > When a controlled reset is initiated, the shutdown facility executes
> > every confgured callback.  If all callbacks return `SYSDOWN_COMPLETE`,
> > then the procedure is done; the system completes the reset with a call
> > to `hal_system_reset()`.
> > 
> > If one or more callbacks returns `SYSDOWN_IN_PROGRESS`, then the
> > shutdown facility needs to wait for those subprocedures to complete.
> > Each in-progress shutdown subprocedure indicates completion
> > asynchronously via a call to `sysdown_complete()`.  When the last
> > remaining entry signifies completion, the shutdown facility finishes the
> > shutdown procedure with a call to `hal_system_reset()`.  To defend
> > against hanging subprocedures, the system can be configured to crash if
> > the shutdown procedure takes too long.
> > 
> > Does that sound reasonable?  All comments welcome.
> > 
> > Thanks,
> > Chris
>

Re: Controlled shutdown

Reply via email to