andrzej-kaczmarek commented on issue #477: npl/riot: reduce default
MSYS_1_BLOCK_SIZE to minimum
URL: https://github.com/apache/mynewt-nimble/pull/477#issuecomment-508759910
Note: to the best of my knowledge things work in NimBLE as I describe below,
but there is very slight chance that some of numbers here are incorrect so
please correct me with a concrete proof in such case.
@haukepetersen
We definitely should improve sanity checks to make sure configuration makes
sense, especially if there are settings like `MSYS_1_BLOCK_SIZE` where we know
that certain size is simply required so we do not try to read past allocated
buffers. You can create an issue for this so we can continue discussion and
perhaps someone can pick this up in spare time (perhaps some volunteer would
like to gain a bit of knowledge on LL internals?).
As for random failures and better error handling, I do not think there is
any major issue in NimBLE in this regard. I mean, if we run out of buffers we
don't just crash (if we do, this is an issue and not deliberate action) but try
to do best we can that is allowed by spec, e.g. reply with an error or drop a
packet (to be more precise: on RX we do not simply drop a packet, but we NAK it
properly so it will be retransmitted). While it may seem like "random
failures", this is not because of inherent randomness in NimBLE but due to
Bluetooth traffic that is pretty much different every time you connect. Also,
failures are usually logged to stats which can give a better understanding of
what happens, but this is iirc not available in ports - feel free to port it :-)
And now `MSYS_1_BLOCK_COUNT`:
- 1 block is absolutely required for LL to start since it's allocated by
scansm as a placeholder for SCAN_REQ PDU; by quickly looking through code I am
pretty sure it would be possible to remove this requirement now since it should
be possible to create SCAN_REQ on the fly (it was not possible when this was
implemented few years ago)
- 1 block is required to receive any LL packet and send reply if needed,
because RX PDU block is reused for TX PDU in such case
This means with only 2 blocks it is possible to create and maintain
connection, and even receive some data. This is also enough for (G)ATT server
to reply on requests in general, however reading attribute from ATT database
requires an extra block for attribute value so some (most?) of those requests
will fail with only 2 block available. With 3 blocks 80 bytes each you can have
working connection with usable GATT (it works, I checked using `btshell`).
So why 3 does not work for you? I do not know, could be that phone you try
to connect with does some procedures which requires an extra block and it's
just not possible to handle all control and data packets before e.g. timeout
occurs. Each device behaves differently and that's basically why using an
"absolute minimum" value is not really a good solution. Perhaps something like
"recommended value" would be more appropriate here and this would be default
value since they are adjusted in case we find current value is not good for
some reason.
One more example to give you better idea on why a single "absolute minimum"
value won't work with all devices. While checking the above I found an issue in
code handling `ATT Read By Group Type Request` where we can indeed silently
drop packet inside ATT server due to oom. To be clear here, we do check this
pretty consistently in code but there is single execution path where ATT error
is not set properly which causes us to drop error response. This has an
interesting consequence because it is actually what makes (G)ATT connection
work with my Android phone using 3 blocks only. What happens after connection
is that phone sends control PDU to update connection parameters and data packet
with mentioned request - both consume remaining 2 blocks. We silently (and
incorrectly) fail to reply on ATT request and apparently Android for whatever
reason will retry after few seconds when LL procedure is completed so we have 2
blocks to handle request and it will work just fine. However, once I fix this
issue (which is a simple one-liner) (G)ATT won't work because we will reply
with an error on ATT request and phone will not retry so you basically can't
discover services. So assuming I will fix the issue, what is an "absolute
minimum" value here? I do not know, because I do not know how many other PDUs
an Android phone can send us at once. As ambiguous as it sounds, "sane" is the
proper value here.
As for data length extensions, small blocks do not mean you can't use this
feature. You can assuming there are enough small blocks to hold long data PDU -
we do not need a single block to hold complete data PDU and we'll do just fine
with chained mbufs. In fact, having a lot of smaller blocks instead of few
large blocks may be better on nRF51 since it allows to use