Hi all,
I was looking into an issue with Flash programming on the STM32F7. I discovered 
some quite odd results.

First, I discovered that OpenOCD always uses 16-bit parallelism. There is a 
comment at the top of stm32f2x.c stating that this was chosen for compatibility 
with the widest possible range of VDD values, but I simply can’t see how this 
is true. The STM32F205/215/207/217 Flash programming manual PM0059 rev 5, the 
STM32F405/415/407/417/427/437/429/439 reference manual RM0090 rev 15, and the 
STM32F75/74 reference manual RM0385 rev 6 all contain exactly the same table, 
which says 64× shall be used with external VPP, 32× shall be used with VDD in 
[2.7,3.6], 16× shall be used with VDD in [2.1,2.7], and 8× shall be used with 
VDD in [1.8,2.1]. I imagine an awful lot of STM32s are probably operated at 3.3 
volts, and that is *not* in the legal VDD range for 16× parallelism. Am I 
misunderstanding something here?

Second, I discovered that, in both algorithm-driven mode and direct programming 
mode, the loop writes to CR, then writes one halfword of data to the target 
address, then checks BSY and the error flags in SR. However, this seems 
unnecessary. CR doesn’t magically change on its own; PG and PSIZE can be set 
once and then many writes performed in a block, increasing efficiency. Also, it 
is not necessary to check BSY after each write. Step 3 of the Flash programming 
sequence is to “perform the data write operation(s)”, which can be plural. If 
you manage to deliver data too fast, the Flash hardware stalls the AHB or AXI 
bus cycles doing the subsequent writes, which eventually translates into a WAIT 
JTAG response (in the direct programming case) or a CPU execution stall (in the 
algorithm-driven case), which is a reasonable flow control mechanism. The error 
bits in SR are also cumulative. Taken together, all this means that one can 
simply write CR once, write all the data, and then check SR afterwards, waiting 
for the last write to finish and examining the error flags. Once modifying the 
code to do this, I then discovered that direct-mode programming with these 
changes is actually faster than algorithm-based programming without them (I was 
not able to successfully modify the algorithm to omit these extra operations, 
but I can’t see it making a whole lot of difference to the execution time in 
algorithm mode.

So what all this leads to is this: I would like to submit some sort of patch, 
but I am not sure what you think would be best. This is my proposal:
1. Get rid of algorithm based Flash writing for stm32f2x altogether.
2. Allow the user to set the parallelism level with a new stm32f2x subcommand, 
since only the board config knows what VDD is being supplied.
3. Move the CR and SR stuff outside the write loop in direct mode, changing the 
entire data write operation into a single target_write_memory call directly to 
the Flash address.

Thoughts? Objections?
-- 
Christopher Head

Attachment: signature.asc
Description: PGP signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenOCD-devel mailing list
OpenOCD-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openocd-devel

Reply via email to