7 Flash programming

Christopher Head Wed, 28 Feb 2018 22:29:26 -0800

On Thu, 1 Mar 2018 00:12:12 +0100
Tomas Vanek <tom_...@users.sourceforge.net> wrote:


> We should also focus to a question why algo flashing is broken on
> FTDI. Some non STM devices (e.g. Kinetis) work with very similar algo
> just perfectly on FTDI or any other adapter.

Sure. If someone could fix algorithm-based flashing, I would love to
use it! I’m not convinced it will make things any faster for my
specific set of hardware, but as long as it’s not broken and not
slower, I don’t really care, and I understand it does make things
faster for some people.

> WAITs are very strange. It looks like the stalled access to flash
> blocks also JTAG access to RAM.
> And SWD access doesn't suffer this silicon bug... who knows... maybe 
> some NOPs in algo busy wait loop would fix it.
> BTW The programming algo should avoid bus stalling, shouldn't it?

I was wondering about this. I have two weird data points I can add to
this discussion.

The first data point is this: remember early on in the thread where I
said I wasn’t able to successfully modify the algorithm to move the CR
write and SR read out of the loop? When I tried, it ran for a while and
then gave either debug regions unpowered or, more commonly, timeout
waiting for algorithm—the same messages I got using the FTDI adapter
with the original, unmodified algorithm, only with the modified
algorithm, it gave those messages using the ByteBlaster as well, which
had formerly been very robust. This seemed suspicious, and I’m
reasonably certain I got the modifications to the algorithm correct
(I’ve done plenty of Thumb assembly in the past), but I didn’t pay too
much attention as the direct approach was so fast.

The second data point is this: when using the algorithm-based approach,
I attached an oscilloscope to TDO coming out of the F7. I was very
surprised to see it *tristate* from time to time (at least, I’m pretty
sure it tristated—it had a very slow rise time and settled to a voltage
somewhat below VDD). I didn’t manage to correlate the time of the
tristate to any particular higher level activity, but it definitely
happened quite frequently during a programming operation and looked
very weird. I’m pretty sure it didn’t happen during direct programming,
only algorithm-driven programming. I found this suspicious, but again,
didn’t look into it too much as the direct approach was very fast.

The reference manual seems a little unclear on whether the algorithm as
written should stall the CPU or not. It says, “Any attempt to read the
Flash memory while it is being written or erased, causes the bus to
stall. Read operations are processed correctly once the program
operation has completed. This means that code or data fetches cannot be
performed while a write/erase operation is ongoing.” The obvious way to
interpret that sentence is that the STRH places the halfword into the
CPU write buffer, the DSB pushes it out as an AXI write cycle to the
Flash interface, the Flash interface immediately completely the bus
cycle and internally buffers the data while starting the burn, and then
the AXI is free to proceed to SR polling, while the *next* AXI cycle,
if any, accessing the Flash interface while still busy will be stalled.
However, an alternative way to interpret that is that the AXI write
cycle that delivers the halfword is stalled by the Flash interface, but
the CPU can continue execution because the data is in the CPU write
buffer, and the CPU can proceed before the bus cycle completes. In this
case the DSB would stall the CPU. BSY seems rather pointless in that
case, but I have learned not to assume anything when reading silicon
documentation (and it could be useful to avoid stalls if used without
DSB, I suppose). My guess would be the first interpretation is correct,
though, which means the algorithm as written should indeed not stall
the CPU ever, since code execution is from DTCM which is unrelated to
Flash. It should be possible to test which interpretation is correct by
performing a STRH followed by a DSB and then checking whether BSY is
set immediately afterwards; if yes, then interpretation 1 is correct,
while if no, then interpretation 2 is correct.

Assuming interpretation 1 is correct, though, I don’t see anything
wrong with the algorithm code.

> What OpenOCD version do you use? It looks like your version misses 
> Matthias' WAIT handling
> if you get such errors like algo timeout.

It was head of master from somewhere in the last week or two. I can
look up the exact commit ID tomorrow if you want.
-- 
Christopher Head

pgpk4rsNNRS1R.pgp
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
OpenOCD-devel mailing list
OpenOCD-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openocd-devel

Re: [OpenOCD-devel] STM32F2/4/7 Flash programming

Reply via email to