And btw, just because the controller is supposed to act this way doesnt mean there isnt a bug where something is going wrong. I will take a look over the code to see if there is a way to orphan command buffers when replying to a command (with command complete or command status).
> On Jun 20, 2016, at 7:13 PM, chris collins <ch...@runtime.io> wrote: > > (Btw, sorry if these emails "look annoying"... my main computer is out of > commission, so I have been using the gmail web interface for the last few > days!) > > There is no connection between the mbuf settings and the max_hci_bufs > setting. I don't have a specific max_hci_buf setting in mind, but 4 or 5 > seems reasonable, but I am not so enthusiastic about this change anymore. > I am pretty sure my theory of what was causing the BLE_HS_ETIMEOUT error is > incorrect, for the following reasons: > > 1. I was discussing this with Will, and he reminded me that the controller > always reuses the command HCI buf when it sends an acknowledgement. In > other words, the controller should never fail to allocate an HCI buf when > sending an acknowledgement to the host. > > 2. The host code *doesn't* return BLE_HS_ETIMEOUT when an acknowledgment is > not received; it returns -1 (another return code bug!). I simply don't see > any code path which would yield a return code of 14 here. I hate to ask, > but... are you sure you the 14 is coming from ble_gap_conn_initiate()? > > I am fairly confident the -1 return code from ble_gap_disc_cancel() is > indeed caused by a hci buffer shortage, but I have a feeling there is some > sort of bug at the root of these issues. Are you able to debug your > application in gdb? I am curious about the state of the nimble stack when > you receive the -1 or 14 error codes. In particular: > > # Print state of HCI buffer pool: > p g_hci_os_event_pool > > # Print GAP master and slave states: > p ble_gap_master > p ble_gap_slave > > If you could capture that information that would much appreciated. > > Finally, to answer a lingering question that I seem to have consistently > ignored: there should not be any issue with timing. After the call to > ble_gap_disc_cancel() returns, you can immediately perform another GAP > procedure. > > Chris > > On Mon, Jun 20, 2016 at 6:08 PM, Simon Ratner <si...@proxy.co> wrote: > >> Ok, so those two sound like they might be have the same cause. Perhaps >> related to that, I also stop receiving incoming connections after a short >> while, possibly for the same reason, although there is no indication in the >> logs or anywhere else on the mynewt side - the connecting central justsees >> a failed connection. >> >> I am able to process all the advertisement reports just fine when I don't >> attempt to cancel discovery / connect to those discovered peripherals. Is >> it possible that cancellation is somehow causing or exacerbating this; for >> example some reports have already been received but are still being handled >> by the stack at the time discovery is cancelled, they are never reported to >> the app and corresponding buffers never freed? Just guessing here. >> >> I'll try increasing hci buffers, too. Do you have a recommended value for >> max_hci_buf? What about the mbuf size passed to ble_ll - is it at all >> correlated with host bufs, should they be allocated in certain ratios? >> >> >> >> On Mon, Jun 20, 2016 at 5:55 PM, chris collins <ch...@runtime.io> wrote: >> >>> Hi Simon, >>> >>> Unfortunately I am not able to reproduce that behavior. However, I >> think I >>> can answer one of your questions. Hopefully that will lead to a full >>> solution. >>> >>> That -1 return code is generated when the stack runs out of HCI command / >>> event buffers. The actual return code is a bug; BLE_HS_ENOMEM should >>> probably be returned instead. I am a bit puzzled about the cause of the >>> buffer shortage. You are probably receiving a lot of advertisement >> reports >>> from the controller, but I wouldn't expect them to be coming in faster >> than >>> you can handle them, but I suppose that depends on the particulars of >> your >>> application. You can try increasing the number of HCI buffers at host >>> initializtion time. This setting is in the host configuration struct, >> and >>> it is called max_hci_bufs. >>> >>> Regarding the second problem (ble_gap_conn_initiate() returns >>> BLE_HS_ETIMEOUT): I have a guess. The return code indicates that the >>> controller did not respond to an HCI command in a timely manner. My >> guess >>> is that the controller is unable to allocate an HCI buffer due to the >>> shortage. From looking at the code, it appears we don't have any >>> statistics indicating the number of times an HCI buffer failed to >>> allocate... this is definitely something that should be added. >>> >>> Chris >>> >>> On Mon, Jun 20, 2016 at 5:07 PM, Simon Ratner <si...@proxy.co> wrote: >>> >>>> Thanks Chris, just tried it out and it seems to do the trick -- half of >>> the >>>> time. >>>> >>>> I see two occasional errors: >>>> >>>> 1. Sometimes, ble_gap_disc_cancel returns (-1); any idea under what >>>> circumstances that might happen? >>>> >>>> 2. Sometimes, ble_gap_disc_cancel returns 0 but ble_gap_conn_initiate >>>> immediately afterwards fails with code 14 (ETIMEOUT? unless it's an hci >>>> error?). Is it possible that this is timing-related somehow and the >> link >>>> layer hasn't switched to the right state yet? Should i delay connect >>>> attempt for a tick? >>>> >>>> Both of these occur inconsistently; about half the time it just works. >>>> >>>> >>>> On Sat, Jun 18, 2016 at 10:21 PM, chris collins <ch...@runtime.io> >>> wrote: >>>> >>>>> Hi Simon, >>>>> >>>>> Thanks for the heads up; this is definitely an omission. You should >> be >>>>> able to cancel a scan in progress. >>>>> >>>>> Barring any unforeseen complications, the cancel functionality should >>> be >>>>> implemented in the develop branch tomorrow. This will allow the app >>>> cancel >>>>> the scan and initiate a connect procedure from within the advertising >>>> event >>>>> callback. >>>>> >>>>> Chris >>>>> >>>>> >>>>> On Sat, Jun 18, 2016 at 7:38 PM, Simon Ratner <si...@proxy.co> >> wrote: >>>>> >>>>>> Hi devs, >>>>>> >>>>>> Having initiated an undirected scan with ble_gap_disc(), I would >> like >>>> to >>>>>> connect to my peripheral as soon as I spot it in the scan callback. >>>>>> However, calling ble_gap_conn_initiate() at this point fails with >>>>>> BLE_HS_EALREADY, as ble_gap_master is still in discovery mode. I >> need >>>> to >>>>>> stash the discovered peripheral, wait for the scan to finish, and >>> then >>>>> try >>>>>> to connect, which is unnecessary state management. Additionally, >>> there >>>>>> doesn't seem to be a way to cancel the scan, so this becomes >>> especially >>>>>> problematic if the scan is long-running. >>>>>> >>>>>> For comparison, while advertising, an incoming connection >>> automatically >>>>>> drops the slave out of advertising mode (which can be resumed >>>> immediately >>>>>> if you have enough connection resources). >>>>>> >>>>>> Is this an omission, or by design? >>>>>> >>>>>> Cheers, >>>>>> simon >>>>>> >>>>> >>>> >>> >>