Re: [Openocd-development] arm1136 scripts
Here is a patch to fix a startup in C100 (arm1136). Basically make sure that UART is configured before using it. Michal diff --git a/tcl/target/c100helper.tcl b/tcl/target/c100helper.tcl index 477fe5c..2a12c36 100644 --- a/tcl/target/c100helper.tcl +++ b/tcl/target/c100helper.tcl @@ -469,11 +469,12 @@ proc initC100 {} { mww $INTC_ARM1_CONTROL_REG 0x1 # configure clocks setupPLL +# setupUART0 must be run before setupDDR2 as setupDDR2 uses UART. +setupUART0 # enable cache # ? (u-boot does nothing here) # DDR2 memory init setupDDR2 -setupUART0 putsUART0 "C100 initialization complete.\n" puts "C100 initialization complete." } ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
On Friday 11 September 2009, Øyvind Harboe wrote: > There are *lots* of snippets around the code. And they're mostly core-specific. Stuff like the DCC support on ARMs, where ARMv4 and ARMv5 share the same model (cp14 based), XScale morphed it by taking away one bit, and newer ARM cores changed the JTAG level interfaces in various ways. Plus the calling conventions aren't necessarily going to be portable ... and some of them need customized exit conditions. Which registers are used as inputs will matter; that assumes params fit in registers, which is _probably_ a safe assumption to make. > I'd like to see what's running under MIPS exercised by > and large by the ARM target And vice versa. Or AVR8; or AVR32. With hardware ECC, NAND-related code snippets need to be controller-specific instead of core-specific. One aid might be to have a small library of standard code fragments, written in C, that new targets should try to package -- for optimization. CFI support would be a useful test: the accelerators are currently there only for ARMv4 and ARMv5 cores. (Though there's no reason they shouldn't work on ARMv6 and ARMv7...) I guess I don't quite see the point to coming up with something more generic than target algorithm support. For ARM there's a current reusability problem: when the code *could* be reused, that's needlessly blocked because the run_algorithm() interface isn't generic enough to ignore what core is underneath. Before bringing MIPS into that picture I suggest it's best to get ARM sorted better. - Dave ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
Duane Ellis wrote: > The idea is this: > Let us assume there is a 4K block (working space) of ram some where. > The code could be 2K, set aside 1K for stack (yes 1K) > And 1K for a download buffer - could be bigger.. > Maybe we work with 4K and 4K... > >The entry point would be fixed - always at Buffer+0 >Set the program counter there, nothing else needs set. > >Then, set a SW breakpoint at - a fixed location. >Example: Buffer + 0x10 >This would leave a few bytes @ the start for startup code >And might be easier for different chips (ie: non-arm chips). > >The target code *RUNS*, sets up the stack and then enters >some type of "for-ever" loop, that looks like this: >*AND* is 100% written in *C* code - not assembler. > OK so far. However, I would like to keep the current small objects like those used in the CFI flash code, and download them on demand, instead of one big blob that handles everything, because: - Buffer space may be severely limited on some CPUs, and by only loading the one algorithm needed for a task, we make best use of that RAM, freeing the rest for buffer. If we pack all code into one target binary blob, it may expand to a size where not everything fits on some targets. - Easier to maintain. We have target code for one single task (eg. flash Intel-style NOR flash roms - just like the current CFI code does), with a defined interface (input and output values at defined places in the buffer), plus the matching host-side code inside OpenOCD that runs that code. I do think that the target algorithm code should be written in C, with a small bit of assembler startup (just as much as is needed to setup stack etc. to run C code). The startup code is CPU/architecture dependant and needs to be written once for each supported CPU family. The algorithm code is pure C and can be used without modifications on any target CPU, regardless of the instruction set. > Why do I suggest C? And the above method... > Because it would work on *ALL* targets! > One only needs to *compile* a small C program. > And little helpers would be very fast. > Ack. The current code that is embedded in the CFI flash code is difficult to read and even more difficult to modify, and is platform-specific. In order to make this work, we need some way to load and relocate the object code to an address that is not known at compile/link time (or a compiler that generates position-independent code for all plattforms). I am not sure if we can simply use ELF files for the target code, and use the existing ELF loader, with the addition of relocation if needed? The ELF files could be shipped with OpenOCD (because you need cross-compilers for all architectures to re-generate them), and the code in OpenOCD would simply request to load "cfi_intel" to a target buffer, which the general code then turns into a load of ".../openocd/lib/arm/xscale/cfi_intel.elf", handling buffer management, setup and relocation. cu Michael ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
There are *lots* of snippets around the code. I'd like to see what's running under MIPS exercised by and large by the ARM target Nothing has happened so far though. Nobody is doing this manual translation, nor has anyone suggested a solution that everybody has fallen for... I don't know if the snippets work on arm11 even... Some snippets are in arm32 only, which spells trouble for cortex m3... If someone has any great ideas here that everybody will buy into, they will be most welcome! -- Øyvind Harboe Embedded software and hardware consulting services http://www.zylin.com ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
On Thursday 10 September 2009, Øyvind Harboe wrote: > W.r.t. run_algorithm, I was thinking about how much work > it would be to write a *small* machine code translator > that would translate generic code in to machine specific > code... Sounds impossibly hard, but is it really? I haven't > looked at what's out there. "gcc" does such translation. ;) An arm_run_algorithm() might be better than the current target_run_algorithm(). It might take a command block, shrinking the param list to something sane and completely arch-neutral (beyond "ARM"). ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
On Thursday 10 September 2009, Duane Ellis wrote: > When this idea would be bad: Little quick downloads Depends how little... > When this idea would be good: BULK transfers, flash programing, etc. > > The idea is this: > Let us assume there is a 4K block (working space) of ram some where. > The code could be 2K, set aside 1K for stack (yes 1K) > And 1K for a download buffer - could be bigger.. not sure I see why this would be better than the existing algo framework, which loads smaller snippets (rarely 100 bytes) on an as-needed basis. However, another way of looking at it: you suggest a limited and standardized kind of "debug monitor" to be used in some cases, as an adjunct to current halt-mode debugging (instead of running in conjunction with the program-being-debugged). Again, this is what the existing "algo" framework does -- just, not with any kind of standard command set. ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
Øyvind Harboe wrote: > Regarding the run_algorithm. This makes me think about > the refactoring I did for the arm simulation code > > W.r.t. run_algorithm, I was thinking about how much work > it would be to write a *small* machine code translator > that would translate generic code in to machine specific > code... Sounds impossibly hard, but is it really? I haven't > looked at what's out there. > We also talked a while back about the idea of a standardized download to the target. The general idea i was taking about at the time is described below. -Duane. A small say 2K, 100% position independent common block of arm code. perhaps - an "armv4" based 32bit code (not thumb) why? Because that would cover all arm7 and arm9 chips. Perhaps - another for cortexM3 perhaps - another for armv7 - (cortexA8) Maybe other chips are "fixed address" - but generally ARM code can be made to be PIC in a very simple way. When this idea would be bad: Little quick downloads When this idea would be good: BULK transfers, flash programing, etc. The idea is this: Let us assume there is a 4K block (working space) of ram some where. The code could be 2K, set aside 1K for stack (yes 1K) And 1K for a download buffer - could be bigger.. Maybe we work with 4K and 4K... The entry point would be fixed - always at Buffer+0 Set the program counter there, nothing else needs set. Then, set a SW breakpoint at - a fixed location. Example: Buffer + 0x10 This would leave a few bytes @ the start for startup code And might be easier for different chips (ie: non-arm chips). The target code *RUNS*, sets up the stack and then enters some type of "for-ever" loop, that looks like this: *AND* is 100% written in *C* code - not assembler. some_c_function( void ) { uint32_t buffer[16]; // this example puts the 4K transfer buffer on the stack uint32_t download_buffer[ 1024 ]; int result; // assume result=success result = 0; for( ;; ){ // buffer[0] = holds result buffer[0] = result; // tell app where transfer buffer is located. buffer[1] = &download_buffer[0]; buffer[2] = sizeof(download_buffer); // this hits openocd breakpoint openocd_syscall( &buffer[0] ); // openocd stuffs parameters in the buffer. // parameter 0 - is the command. // parameter 1/2/3 ... /15 are command specific switch( buffer[0] ){ case CFI_FLASH_ERASE: // params 1,2,3 are address and length result= perform_cfi_erase( &buffer[0] ); case HIGH_SPEED_DOWNLOAD: // on ARM this might use CP15 DCC result =perform_high_speed_download( &buffer[0] ); case .. other commands // break } // switch } // forever() } == Why do I suggest C? And the above method... Because it would work on *ALL* targets! One only needs to *compile* a small C program. And little helpers would be very fast. -Duane. ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
Regarding the run_algorithm. This makes me think about the refactoring I did for the arm simulation code W.r.t. run_algorithm, I was thinking about how much work it would be to write a *small* machine code translator that would translate generic code in to machine specific code... Sounds impossibly hard, but is it really? I haven't looked at what's out there. -- Øyvind Harboe Embedded software and hardware consulting services http://www.zylin.com ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
On Thursday 10 September 2009, Øyvind Harboe wrote: > > I can see there is run_algorithm implemented in arm11.c. Can you give me > > some pointers on what needs to be added/changed? I can take a stab at > > this. > > I think David looked at this... > > Can you share David? One issue I have with the ARM run_algorithm() stuff is that each core has a separate *interface* ... the last param is core-specific, not generic for all ARM cores. So for example the src/flash/arm_nandwrite.c code can't be used on ARMv6 or ARMv7 cores. And there are various CFI utils that can't be used either (for NOR flash). Which means if you have some ARM algorithm, you need different invocation code for ARMv4/ARMv5, ARMv6 (like arm1136), ARMv7, etc. OR ... there's pretty dubious stuff going on there, where the ARMv4/ARMv5 stuff is used as "generic ARM" even though there's a bunch of stuff that should be core-specific. Sorting all that out could be messy; but the place to start is that "arch_info" parameter. Now, the arm11 stuff (is it arm1136-specific? or does it work for other arm11 cores?) doesn't use that param. But half the code in its run_algorithm() method is commented out, Thumb isn't supported, there's a big HACKHACKHACK comment up front, and so forth. One gets the feeling it's been used much yet! So one thing to do is just to address those obvious problems in the code. Specific to the ARM1136 cores, I'm not sure the bulk_write() method is as fast as it should be. For ARMv4/ARMv5 there is some DCC write code which seems to make a big difference; it uses the run_algorithm() logic. Doesn't NOR flash writing use those bulk_write() paths? Section 14.8.14 of the ARM1136 spec shows what is claimed to be a code sequence that's optimized for fast writes. I think something using the DCC would be faster. (Note: DCC on 1136 differs from v4/v5.) - Dave ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
On Thu, Sep 10, 2009 at 10:03 PM, michal smulski wrote: > > I can see there is run_algorithm implemented in arm11.c. Can you give me > some pointers on what needs to be added/changed? I can take a stab at > this. I think David looked at this... Can you share David? > Note that I see a really slow times for coping uboot to DDR memory > (load_image). Is this expected as well? > > If I turn on burst writes, things go much faster but I get an error > message and the end. Separate post instructions & details to reproduce? > > Thanks, > Michal > > On Thu, 2009-09-10 at 10:07 +0200, Øyvind Harboe wrote: >> Committed. >> >> Thanks! >> >> > 1. How do I speed up flash writes. Right now, I get about 2kB/s write to >> > NOR (see flashUBOOT proc). >> >> the arm11 needs the run_algorithm support. Known problem, "sombody" >> needs to pitch in, it's not super hard. >> >> > 2. Is there a way to run openocd and flash write command from the same >> > script. That is, I don't want to telnet to port and run flash write >> > command. I would like to automate it. >> >> Write a script: >> >> init # initializes openocd >> flash # do whatever you need to do >> >> >> >> > > -- Øyvind Harboe Embedded software and hardware consulting services http://www.zylin.com ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development
Re: [Openocd-development] arm1136 scripts
I can see there is run_algorithm implemented in arm11.c. Can you give me some pointers on what needs to be added/changed? I can take a stab at this. Note that I see a really slow times for coping uboot to DDR memory (load_image). Is this expected as well? If I turn on burst writes, things go much faster but I get an error message and the end. Thanks, Michal On Thu, 2009-09-10 at 10:07 +0200, Øyvind Harboe wrote: > Committed. > > Thanks! > > > 1. How do I speed up flash writes. Right now, I get about 2kB/s write to > > NOR (see flashUBOOT proc). > > the arm11 needs the run_algorithm support. Known problem, "sombody" > needs to pitch in, it's not super hard. > > > 2. Is there a way to run openocd and flash write command from the same > > script. That is, I don't want to telnet to port and run flash write > > command. I would like to automate it. > > Write a script: > > init # initializes openocd > flash # do whatever you need to do > > > > ___ Openocd-development mailing list Openocd-development@lists.berlios.de https://lists.berlios.de/mailman/listinfo/openocd-development