And btw, just because the controller is supposed to act this way doesnt mean 
there isnt a bug where something is going wrong. I will take a look over the 
code to see if there is a way to orphan command buffers when replying to a 
command (with command complete or command status).


> On Jun 20, 2016, at 7:13 PM, chris collins <ch...@runtime.io> wrote:
> 
> (Btw, sorry if these emails "look annoying"... my main computer is out of
> commission, so I have been using the gmail web interface for the last few
> days!)
> 
> There is no connection between the mbuf settings and the max_hci_bufs
> setting.  I don't have a specific max_hci_buf setting in mind, but 4 or 5
> seems reasonable, but I am not so enthusiastic about this change anymore.
> I am pretty sure my theory of what was causing the BLE_HS_ETIMEOUT error is
> incorrect, for the following reasons:
> 
> 1. I was discussing this with Will, and he reminded me that the controller
> always reuses the command HCI buf when it sends an acknowledgement.  In
> other words, the controller should never fail to allocate an HCI buf when
> sending an acknowledgement to the host.
> 
> 2. The host code *doesn't* return BLE_HS_ETIMEOUT when an acknowledgment is
> not received; it returns -1 (another return code bug!).  I simply don't see
> any code path which would yield a return code of 14 here.  I hate to ask,
> but... are you sure you the 14 is coming from ble_gap_conn_initiate()?
> 
> I am fairly confident the -1 return code from ble_gap_disc_cancel() is
> indeed caused by a hci buffer shortage, but I have a feeling there is some
> sort of bug at the root of these issues.  Are you able to debug your
> application in gdb?  I am curious about the state of the nimble stack when
> you receive the -1 or 14 error codes.  In particular:
> 
> # Print state of HCI buffer pool:
> p g_hci_os_event_pool
> 
> # Print GAP master and slave states:
> p ble_gap_master
> p ble_gap_slave
> 
> If you could capture that information that would much appreciated.
> 
> Finally, to answer a lingering question that I seem to have consistently
> ignored: there should not be any issue with timing.  After the call to
> ble_gap_disc_cancel() returns, you can immediately perform another GAP
> procedure.
> 
> Chris
> 
> On Mon, Jun 20, 2016 at 6:08 PM, Simon Ratner <si...@proxy.co> wrote:
> 
>> Ok, so those two sound like they might be have the same cause. Perhaps
>> related to that, I also stop receiving incoming connections after a short
>> while, possibly for the same reason, although there is no indication in the
>> logs or anywhere else on the mynewt side - the connecting central justsees
>> a failed connection.
>> 
>> I am able to process all the advertisement reports just fine when I don't
>> attempt to cancel discovery / connect to those discovered peripherals. Is
>> it possible that cancellation is somehow causing or exacerbating this; for
>> example some reports have already been received but are still being handled
>> by the stack at the time discovery is cancelled, they are never reported to
>> the app and corresponding buffers never freed? Just guessing here.
>> 
>> I'll try increasing hci buffers, too. Do you have a recommended value for
>> max_hci_buf? What about the mbuf size passed to ble_ll - is it at all
>> correlated with host bufs, should they be allocated in certain ratios?
>> 
>> 
>> 
>> On Mon, Jun 20, 2016 at 5:55 PM, chris collins <ch...@runtime.io> wrote:
>> 
>>> Hi Simon,
>>> 
>>> Unfortunately I am not able to reproduce that behavior.  However, I
>> think I
>>> can answer one of your questions.  Hopefully that will lead to a full
>>> solution.
>>> 
>>> That -1 return code is generated when the stack runs out of HCI command /
>>> event buffers.  The actual return code is a bug; BLE_HS_ENOMEM should
>>> probably be returned instead.  I am a bit puzzled about the cause of the
>>> buffer shortage.  You are probably receiving a lot of advertisement
>> reports
>>> from the controller, but I wouldn't expect them to be coming in faster
>> than
>>> you can handle them, but I suppose that depends on the particulars of
>> your
>>> application.  You can try increasing the number of HCI buffers at host
>>> initializtion time.  This setting is in the host configuration struct,
>> and
>>> it is called max_hci_bufs.
>>> 
>>> Regarding the second problem (ble_gap_conn_initiate() returns
>>> BLE_HS_ETIMEOUT): I have a guess.  The return code indicates that the
>>> controller did not respond to an HCI command in a timely manner.  My
>> guess
>>> is that the controller is unable to allocate an HCI buffer due to the
>>> shortage.  From looking at the code, it appears we don't have any
>>> statistics indicating the number of times an HCI buffer failed to
>>> allocate... this is definitely something that should be added.
>>> 
>>> Chris
>>> 
>>> On Mon, Jun 20, 2016 at 5:07 PM, Simon Ratner <si...@proxy.co> wrote:
>>> 
>>>> Thanks Chris, just tried it out and it seems to do the trick -- half of
>>> the
>>>> time.
>>>> 
>>>> I see two occasional errors:
>>>> 
>>>> 1. Sometimes, ble_gap_disc_cancel returns (-1); any idea under what
>>>> circumstances that might happen?
>>>> 
>>>> 2. Sometimes, ble_gap_disc_cancel returns 0 but ble_gap_conn_initiate
>>>> immediately afterwards fails with code 14 (ETIMEOUT? unless it's an hci
>>>> error?). Is it possible that this is timing-related somehow and the
>> link
>>>> layer hasn't switched to the right state yet? Should i delay connect
>>>> attempt for a tick?
>>>> 
>>>> Both of these occur inconsistently; about half the time it just works.
>>>> 
>>>> 
>>>> On Sat, Jun 18, 2016 at 10:21 PM, chris collins <ch...@runtime.io>
>>> wrote:
>>>> 
>>>>> Hi Simon,
>>>>> 
>>>>> Thanks for the heads up; this is definitely an omission.  You should
>> be
>>>>> able to cancel a scan in progress.
>>>>> 
>>>>> Barring any unforeseen complications, the cancel functionality should
>>> be
>>>>> implemented in the develop branch tomorrow.  This will allow the app
>>>> cancel
>>>>> the scan and initiate a connect procedure from within the advertising
>>>> event
>>>>> callback.
>>>>> 
>>>>> Chris
>>>>> 
>>>>> 
>>>>> On Sat, Jun 18, 2016 at 7:38 PM, Simon Ratner <si...@proxy.co>
>> wrote:
>>>>> 
>>>>>> Hi devs,
>>>>>> 
>>>>>> Having initiated an undirected scan with ble_gap_disc(), I would
>> like
>>>> to
>>>>>> connect to my peripheral as soon as I spot it in the scan callback.
>>>>>> However, calling ble_gap_conn_initiate() at this point fails with
>>>>>> BLE_HS_EALREADY, as ble_gap_master is still in discovery mode. I
>> need
>>>> to
>>>>>> stash the discovered peripheral, wait for the scan to finish, and
>>> then
>>>>> try
>>>>>> to connect, which is unnecessary state management. Additionally,
>>> there
>>>>>> doesn't seem to be a way to cancel the scan, so this becomes
>>> especially
>>>>>> problematic if the scan is long-running.
>>>>>> 
>>>>>> For comparison, while advertising, an incoming connection
>>> automatically
>>>>>> drops the slave out of advertising mode (which can be resumed
>>>> immediately
>>>>>> if you have enough connection resources).
>>>>>> 
>>>>>> Is this an omission, or by design?
>>>>>> 
>>>>>> Cheers,
>>>>>> simon
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to