What is the simple solution to which you are referring? Not erasing if you do not need to? Or what was mentioned in the thread regarding discarding a PDU if the time it arrived does not seem to make sense? As I am sure you realize, neither of those will guarantee a fix.
Anyway, there have been thoughts on this but no solution has been decided upon. For now your best bet is to simply reconnect if the connection drops (imo). > On May 5, 2017, at 4:23 PM, Jacob Rosenthal <jakerosent...@gmail.com> wrote: > > Any thoughts on this? > > Also, Theres a simple solution while a more complex one is discussed. > The current code erases every time no matter what, theres even a code > comment there about it. > https://github.com/apache/incubator-mynewt-core/blob/afa6d53254cbf444a3f44cc1851f0b038227edb6/mgmt/imgmgr/src/imgmgr.c#L322 > > The update process already includes at least one reset and reconnect (after > test/confirm), so adding another reconnect after first firmware upload > resets as it erases is 'fine'. > > Im not sure how to get started on that myself. Thoughts on this? > > > On Tue, Apr 25, 2017 at 6:51 PM, Will San Filippo <w...@micosa.net> wrote: > >> Hello: >> >> Recently there has been some discussion around image upload, erasing >> flash, and connection supervision timeouts. We have recently noticed a case >> where it appears that due to erasing the flash the connection will time out >> (supervision timeout). Changing the supervision timeout, slave latency, etc >> will not guarantee success (although may make it less likely to occur). >> >> We are currently evaluating fixes for this issue so in the meantime just >> be aware that this could occur. >> >> For those who want the gory details: >> >> What I think is happening is the following. A connection event starts at >> the correct time (it is not delayed by the flash erase) but at some point >> prior to getting the ADDRESS event a flash erase starts. The ADDRESS event >> is used internally to capture a timer value which records the start of the >> PDU. A slave will reset its anchor point when it receives a PDU as long as >> the access address matches. So what I think is happening is that the >> ADDRESS event and timer capture gets delayed because the CPU is halted >> during the flash erase causing the anchor point to get reset to an invalid >> time. The slave (peripheral) then wakes up at the incorrect time for all >> connection events from that point forward causing an eventual supervision >> timeout. >> >> I realize that some might say to just drop the PDU if the timing is off. >> This would probably go a long way to making it much less likely to occur >> but there is still a chance that the connection will time out. The only way >> to guarantee this not occurring is to synchronize flash erase with radio >> events (something the current nimble stack does not do).