Hello all, The Mynewt BLE stack is called Nimble. Nimble consists of two packages: * Controller (link-layer) [net/nimble/controller] * Host (upper layers) [net/nimble/host]
This email concerns the Nimble host. As I indicated in an email a few weeks ago, the code size of the Nimble host had increased beyond what I considered a reasonable level. When built for the ARM cortex-M4, with security enabled and the log level set to INFO, the host code size was about 48 kB. In recent days, I came up with a few ideas for reducing the host code size. As I explored these ideas, I realized that they open the door for some major improvements in the fundamental design of the host. Making these changes would introduce some backwards-compatibility issues, but I believe it is absolutely the right thing to do. If we do this, it needs to be done now while Mynewt is still in its beta phase. I have convinced myself that this is the right way forward; now I would like to see what the community thinks. As always, all feedback is greatly appreciated. There are two major changes that I am proposing: 1. All HCI command/acknowledgement exchanges are blocking. Background: The host and controller communicate with one another via the host-controller-interface (HCI) protocol. The host sends _commands_ to the controller; the controller sends _events_ to the host. Whenever the controller receives a command from the host, it immediately responds with an acknowledgement event. In addition, the controller also sends unsolicited events to the host to indicate state changes or to request information in a subsequent command. In the current host, all HCI commands are sent asynchronously (non-blocking). When the host wants to send an HCI command, it schedules a transmit operation by putting an OS event on its own event queue. The event points to a callback which does the actual HCI transmission. The callback also configures a second callback to be executed when the expected acknowledgement is received from the controller. Each time the host receives an HCI event from the controller, an OS event is put on the host's event queue. Processing of this OS event ultimately calls the configured callback (if it is an acknowledgement), or a hardcoded callback (if it is an unsolicited HCI event). This design works, but it introduces a number of problems. First, it requires the host code to maintain some quite complex state machines for what seem like simple HCI exchanges. This FSM machinery translates into a lot of extra code. There is also a lot of ugliness involved in canceling scheduled HCI transmits. Another complication with non-blocking HCI commands is that they require the host to jump through a lot of hoops to provide feedback to the application. Since all the work is done in parallel by the host task, the host has to notify the application of failures by executing callbacks configured by the application. I did not want to place any restrictions on what the application is allowed to do during these callbacks, which means the host has to ensure that it is in a valid state whenever a callback gets executed (no mutexes are locked, for example). This requires the code to use a large number of mutexes and temporary copies of host data structures, resulting in a lot of complicated code. Finally, non-blocking HCI operations complicates the API presented to the application. A single return code from a blocking operation is easier to manage than a return code plus the possibility of a callback being executed sometime in the future from a different task. A blocking operation collapses several failure scenarios into a single function return. Making HCI command/acknowledgement exchanges blocking addresses all of the above issues: * FSM machinery goes away; controller response is indicated in the return code of the HCI send function. * Nearly all HCI failures are indicated to the application immediately, so there is no need for lots of mutexes and temporary copies of data structures. * API is simplified; operation results are indicated via a simple function return code. 2. The Nimble host is "taskless" Currently the Nimble host runs in its own OS task. This is not necessarily a bad thing, but in the case of the host, I think the costs outweigh the benefits. I can think of three benefits to running a library in its own task: * Guarantee that timing requirements are met; just configure the task with an appropriate priority. * (related to the above point) The library task can continue to work while the application task is blocked. * Facilitates stack sizing. Since the library performs its operations in its own stack, it is easier to predict stack usage of both the library task and the application task. I don't think any of these benefits are very compelling in the case of the Nimble host for the following reasons: * The host has nothing resembling real-time timing requirements. There should be absolutely no problem with running the host task at the lowest priority, unless the hardware is simply overburdened, in which case there is no way to avoid issues no matter what you do. * The host code makes heavy use of application callbacks, making it quite difficult to estimate stack usage. Since the host stack requirements depend on what the application does during these callbacks, the application would need to specify the host stack size during initialization anyway. My proposal is to turn the Nimble host into a "flat" library that runs in an application task. When the application initializes the host, it indicates which OS event queue should be used for host-related events. Host operations would be captured in OS_EVENT_TIMER events that the application task would need to handle generically, as it most likely already does. Note that these events would not be produced by an actual timer; the events would be placed on the event queue immediately. The OS_EVENT_TIMER event type would just be used because it provides a basic callback structure. I should note also that it is fairly trivial for an application to turn such a flat library into its own task if that is desired. The application developer would just need to create a simple task that handles the OS_EVENT_TIMER events. I think these two changes will have the following implications: 1. Simpler API. 2. Less RAM usage (no more FSM state, no parallel stacks). 3. More RAM usage (larger stack). 4. Major reduction in code size (I estimate a total size of 35 kB). Hopefully points 2 and 3 will cancel each other out. Thanks for reading, Chris