Hi all, I was looking into an issue with Flash programming on the STM32F7. I discovered some quite odd results.
First, I discovered that OpenOCD always uses 16-bit parallelism. There is a comment at the top of stm32f2x.c stating that this was chosen for compatibility with the widest possible range of VDD values, but I simply can’t see how this is true. The STM32F205/215/207/217 Flash programming manual PM0059 rev 5, the STM32F405/415/407/417/427/437/429/439 reference manual RM0090 rev 15, and the STM32F75/74 reference manual RM0385 rev 6 all contain exactly the same table, which says 64× shall be used with external VPP, 32× shall be used with VDD in [2.7,3.6], 16× shall be used with VDD in [2.1,2.7], and 8× shall be used with VDD in [1.8,2.1]. I imagine an awful lot of STM32s are probably operated at 3.3 volts, and that is *not* in the legal VDD range for 16× parallelism. Am I misunderstanding something here? Second, I discovered that, in both algorithm-driven mode and direct programming mode, the loop writes to CR, then writes one halfword of data to the target address, then checks BSY and the error flags in SR. However, this seems unnecessary. CR doesn’t magically change on its own; PG and PSIZE can be set once and then many writes performed in a block, increasing efficiency. Also, it is not necessary to check BSY after each write. Step 3 of the Flash programming sequence is to “perform the data write operation(s)”, which can be plural. If you manage to deliver data too fast, the Flash hardware stalls the AHB or AXI bus cycles doing the subsequent writes, which eventually translates into a WAIT JTAG response (in the direct programming case) or a CPU execution stall (in the algorithm-driven case), which is a reasonable flow control mechanism. The error bits in SR are also cumulative. Taken together, all this means that one can simply write CR once, write all the data, and then check SR afterwards, waiting for the last write to finish and examining the error flags. Once modifying the code to do this, I then discovered that direct-mode programming with these changes is actually faster than algorithm-based programming without them (I was not able to successfully modify the algorithm to omit these extra operations, but I can’t see it making a whole lot of difference to the execution time in algorithm mode. So what all this leads to is this: I would like to submit some sort of patch, but I am not sure what you think would be best. This is my proposal: 1. Get rid of algorithm based Flash writing for stm32f2x altogether. 2. Allow the user to set the parallelism level with a new stm32f2x subcommand, since only the board config knows what VDD is being supplied. 3. Move the CR and SR stuff outside the write loop in direct mode, changing the entire data write operation into a single target_write_memory call directly to the Flash address. Thoughts? Objections? -- Christopher Head
signature.asc
Description: PGP signature
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel