Re: [Openocd-development] arm1136 scripts

2010-06-14 Thread michal smulski

Here is a patch to fix a startup in C100 (arm1136). Basically make sure
that UART is configured before using it.

Michal
diff --git a/tcl/target/c100helper.tcl b/tcl/target/c100helper.tcl
index 477fe5c..2a12c36 100644
--- a/tcl/target/c100helper.tcl
+++ b/tcl/target/c100helper.tcl
@@ -469,11 +469,12 @@ proc initC100 {} {
 mww $INTC_ARM1_CONTROL_REG 0x1
 # configure clocks
 setupPLL
+# setupUART0 must be run before setupDDR2 as setupDDR2 uses UART.
+setupUART0
 # enable cache
 # ? (u-boot does nothing here)
 # DDR2 memory init
 setupDDR2
-setupUART0
 putsUART0 "C100 initialization complete.\n"
 puts "C100 initialization complete."
 }
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-11 Thread David Brownell
On Friday 11 September 2009, Øyvind Harboe wrote:
> There are *lots* of snippets around the code.

And they're mostly core-specific.  Stuff like the
DCC support on ARMs, where ARMv4 and ARMv5 share
the same model (cp14 based), XScale morphed it by
taking away one bit, and newer ARM cores changed
the JTAG level interfaces in various ways.

Plus the calling conventions aren't necessarily
going to be portable ... and some of them need
customized exit conditions.  Which registers are
used as inputs will matter; that assumes params
fit in registers, which is _probably_ a safe
assumption  to make.


> I'd like to see what's running under MIPS exercised by
> and large by the ARM target

And vice versa.  Or AVR8; or AVR32.  With hardware ECC,
NAND-related code snippets need to be controller-specific
instead of core-specific.

One aid might be to have a small library of standard
code fragments, written in C, that new targets should
try to package -- for optimization.  CFI support would
be a useful test:  the accelerators are currently there
only for ARMv4 and ARMv5 cores.  (Though there's no
reason they shouldn't work on ARMv6 and ARMv7...)


I guess I don't quite see the point to coming up with
something more generic than target algorithm support.

For ARM there's a current reusability problem:  when
the code *could* be reused, that's needlessly blocked
because the run_algorithm() interface isn't generic
enough to ignore what core is underneath.

Before bringing MIPS into that picture I suggest it's
best to get ARM sorted better.

- Dave
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-11 Thread Michael Schwingen
Duane Ellis wrote:
> The idea is this:
>  Let us assume there is a 4K block (working space) of ram some where.
>  The code could be 2K, set aside 1K for stack (yes 1K)
>  And 1K for a download buffer - could be bigger..
> Maybe we work with 4K and 4K...
>
>The entry point would be fixed - always at Buffer+0
>Set the program counter there, nothing else needs set.
>  
>Then, set a SW breakpoint at - a fixed location.
>Example:  Buffer + 0x10
>This would leave a few bytes @ the start for startup code
>And might be easier for different chips (ie: non-arm chips).
>  
>The target code *RUNS*, sets up the stack and then enters
>some type of "for-ever" loop, that looks like this:
>*AND* is 100% written in *C* code - not assembler.
>   
OK so far. However, I would like to keep the current small objects like
those used in the CFI flash code, and download them on demand, instead
of one big blob that handles everything, because:

 - Buffer space may be severely limited on some CPUs, and by only
loading the one algorithm needed for a task, we make best use of that
RAM, freeing the rest for buffer. If we pack all code into one target
binary blob, it may expand to a size where not everything fits on some
targets.

 - Easier to maintain. We have target code for one single task (eg.
flash Intel-style NOR flash roms - just like the current CFI code does),
with a defined interface (input and output values at defined places in
the buffer), plus the matching host-side code inside OpenOCD that runs
that code.

I do think that the target algorithm code should be written in C, with a
small bit of assembler startup (just as much as is needed to setup stack
etc. to run C code). The startup code is CPU/architecture dependant and
needs to be written once for each supported CPU family. The algorithm
code is pure C and can be used without modifications on any target CPU,
regardless of the instruction set.

> Why do I suggest C? And the above method...
> Because it would work on *ALL* targets!
> One only needs to *compile* a small C program.
> And little helpers would be very fast.
>   
Ack. The current code that is embedded in the CFI flash code is
difficult to read and even more difficult to modify, and is
platform-specific.


In order to make this work, we need some way to load and relocate the
object code to an address that is not known at compile/link time (or a
compiler that generates position-independent code for all plattforms).

I am not sure if we can simply use ELF files for the target code, and
use the existing ELF loader, with the addition of relocation if needed?

The ELF files could be shipped with OpenOCD (because you need
cross-compilers for all architectures to re-generate them), and the code
in OpenOCD would simply request to load "cfi_intel" to a target buffer,
which the general code then turns into a load of
".../openocd/lib/arm/xscale/cfi_intel.elf", handling buffer management,
setup and relocation.

cu
Michael

___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-11 Thread Øyvind Harboe
There are *lots* of snippets around the code.

I'd like to see what's running under MIPS exercised by
and large by the ARM target

Nothing has happened so far though. Nobody is doing
this manual translation, nor has anyone suggested a
solution that everybody has fallen for...

I don't know if the snippets work on arm11 even... Some snippets
are in arm32 only, which spells trouble for cortex m3...

If someone has any great ideas here that everybody
will buy into, they will be most welcome!

-- 
Øyvind Harboe
Embedded software and hardware consulting services
http://www.zylin.com
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-11 Thread David Brownell
On Thursday 10 September 2009, Øyvind Harboe wrote:
> W.r.t. run_algorithm, I was thinking about how much work
> it would be to write a *small* machine code translator
> that would translate generic code in to machine specific
> code... Sounds impossibly hard, but is it really? I haven't
> looked at what's out there.

"gcc" does such translation.  ;)

An arm_run_algorithm() might be better than the current
target_run_algorithm().  It might take a command block,
shrinking the param list to something sane and completely
arch-neutral (beyond "ARM").

___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-11 Thread David Brownell
On Thursday 10 September 2009, Duane Ellis wrote:
> When this idea would be bad:      Little quick downloads

Depends how little... 

> When this idea would be good:   BULK transfers, flash programing, etc.
> 
> The idea is this:
>      Let us assume there is a 4K block (working space) of ram some where.
>      The code could be 2K, set aside 1K for stack (yes 1K)
>      And 1K for a download buffer - could be bigger..

 not sure I see why this would be better than the existing
algo framework, which loads smaller snippets (rarely 100 bytes)
on an as-needed basis.

However, another way of looking at it:  you suggest a limited and
standardized kind of "debug monitor" to be used in some cases, as
an adjunct to current halt-mode debugging (instead of running in
conjunction with the program-being-debugged).  Again, this is what
the existing "algo" framework does -- just, not with any kind of
standard command set.



___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-10 Thread Duane Ellis
Øyvind Harboe wrote:
> Regarding the run_algorithm. This makes me think about
> the refactoring I did for the arm simulation code
>
> W.r.t. run_algorithm, I was thinking about how much work
> it would be to write a *small* machine code translator
> that would translate generic code in to machine specific
> code... Sounds impossibly hard, but is it really? I haven't
> looked at what's out there.
>   

We also talked a while back about the idea of a standardized download to 
the target.

The general idea i was taking about at the time is described below.

-Duane.


 A small say 2K, 100% position independent common block of arm code.
   perhaps - an "armv4" based 32bit code (not thumb)
  why? Because that would cover all arm7 and arm9 chips.

   Perhaps - another for cortexM3
   perhaps - another for armv7 - (cortexA8)

  Maybe other chips are "fixed address" - but generally ARM code can be made
  to be PIC in a very simple way.

When this idea would be bad:  Little quick downloads
When this idea would be good:   BULK transfers, flash programing, etc.

The idea is this:
 Let us assume there is a 4K block (working space) of ram some where.
 The code could be 2K, set aside 1K for stack (yes 1K)
 And 1K for a download buffer - could be bigger..
Maybe we work with 4K and 4K...

   The entry point would be fixed - always at Buffer+0
   Set the program counter there, nothing else needs set.
 
   Then, set a SW breakpoint at - a fixed location.
   Example:  Buffer + 0x10
   This would leave a few bytes @ the start for startup code
   And might be easier for different chips (ie: non-arm chips).
 
   The target code *RUNS*, sets up the stack and then enters
   some type of "for-ever" loop, that looks like this:
   *AND* is 100% written in *C* code - not assembler.

   some_c_function( void )
   {
  uint32_t   buffer[16];
  // this example puts the 4K transfer buffer on the stack
  uint32_t   download_buffer[ 1024 ];
  int result;

 // assume result=success
 result = 0;

  for( ;; ){
  // buffer[0] = holds result
buffer[0] = result;
 // tell app where transfer buffer is located.
 buffer[1] = &download_buffer[0];
 buffer[2] = sizeof(download_buffer);

// this hits openocd breakpoint
   openocd_syscall( &buffer[0] );

   // openocd stuffs parameters in the buffer.
   // parameter 0 - is the command.
 // parameter 1/2/3 ... /15 are command specific

   switch( buffer[0] ){
   case CFI_FLASH_ERASE:
// params 1,2,3 are address and length
result= perform_cfi_erase( &buffer[0] );
   case HIGH_SPEED_DOWNLOAD:
// on ARM this might use CP15 DCC
result =perform_high_speed_download( 
&buffer[0] );
   case .. other commands
// break
   } // switch
  } // forever()
   }

==

Why do I suggest C? And the above method...
Because it would work on *ALL* targets!
One only needs to *compile* a small C program.
And little helpers would be very fast.

-Duane.



___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-10 Thread Øyvind Harboe
Regarding the run_algorithm. This makes me think about
the refactoring I did for the arm simulation code

W.r.t. run_algorithm, I was thinking about how much work
it would be to write a *small* machine code translator
that would translate generic code in to machine specific
code... Sounds impossibly hard, but is it really? I haven't
looked at what's out there.


-- 
Øyvind Harboe
Embedded software and hardware consulting services
http://www.zylin.com
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-10 Thread David Brownell
On Thursday 10 September 2009, Øyvind Harboe wrote:
> > I can see there is run_algorithm implemented in arm11.c. Can you give me
> > some pointers on what needs to be added/changed? I can take a stab at
> > this.
> 
> I think David looked at this...
> 
> Can you share David?

One issue I have with the ARM run_algorithm() stuff is that each
core has a separate *interface* ... the last param is core-specific,
not generic for all ARM cores.

So for example the src/flash/arm_nandwrite.c code can't be used
on ARMv6 or ARMv7 cores.  And there are various CFI utils that
can't be used either (for NOR flash).

Which means if you have some ARM algorithm, you need different
invocation code for ARMv4/ARMv5, ARMv6 (like arm1136), ARMv7, etc.
OR ... there's pretty dubious stuff going on there, where the
ARMv4/ARMv5 stuff is used as "generic ARM" even though there's
a bunch of stuff that should be core-specific.  Sorting all that
out could be messy; but the place to start is that "arch_info"
parameter.


Now, the arm11 stuff (is it arm1136-specific? or does it work for
other arm11 cores?) doesn't use that param.  But half the code in
its run_algorithm() method is commented out, Thumb isn't supported,
there's a big HACKHACKHACK comment up front, and so forth.  One
gets the feeling it's been used much yet!  So one thing to do is
just to address those obvious problems in the code.

Specific to the ARM1136 cores, I'm not sure the bulk_write()
method is as fast as it should be.  For ARMv4/ARMv5 there is
some DCC write code which seems to make a big difference; it
uses the run_algorithm() logic.  Doesn't NOR flash writing use
those bulk_write() paths?  Section 14.8.14 of the ARM1136
spec shows what is claimed to be a code sequence that's
optimized for fast writes.  I think something using the DCC
would be faster.  (Note:  DCC on 1136 differs from v4/v5.)

- Dave
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-10 Thread Øyvind Harboe
On Thu, Sep 10, 2009 at 10:03 PM, michal smulski
 wrote:
>
> I can see there is run_algorithm implemented in arm11.c. Can you give me
> some pointers on what needs to be added/changed? I can take a stab at
> this.

I think David looked at this...

Can you share David?

> Note that I see a really slow times for coping uboot to DDR memory
> (load_image). Is this expected as well?
>
> If I turn on burst writes, things go much faster but I get an error
> message and the end.

Separate post instructions & details to reproduce?

>
> Thanks,
> Michal
>
> On Thu, 2009-09-10 at 10:07 +0200, Øyvind Harboe wrote:
>> Committed.
>>
>> Thanks!
>>
>> > 1. How do I speed up flash writes. Right now, I get about 2kB/s write to
>> > NOR (see flashUBOOT proc).
>>
>> the arm11 needs the run_algorithm support. Known problem, "sombody"
>> needs to pitch in, it's not super hard.
>>
>> > 2. Is there a way to run openocd and flash write command from the same
>> > script. That is, I don't want to telnet to port  and run flash write
>> > command. I would like to automate it.
>>
>> Write a script:
>>
>> init # initializes openocd
>> flash  # do whatever you need to do
>>
>>
>>
>>
>
>



-- 
Øyvind Harboe
Embedded software and hardware consulting services
http://www.zylin.com
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] arm1136 scripts

2009-09-10 Thread michal smulski

I can see there is run_algorithm implemented in arm11.c. Can you give me
some pointers on what needs to be added/changed? I can take a stab at 
this.

Note that I see a really slow times for coping uboot to DDR memory
(load_image). Is this expected as well?

If I turn on burst writes, things go much faster but I get an error
message and the end.

Thanks,
Michal

On Thu, 2009-09-10 at 10:07 +0200, Øyvind Harboe wrote:
> Committed.
> 
> Thanks!
> 
> > 1. How do I speed up flash writes. Right now, I get about 2kB/s write to
> > NOR (see flashUBOOT proc).
> 
> the arm11 needs the run_algorithm support. Known problem, "sombody"
> needs to pitch in, it's not super hard.
> 
> > 2. Is there a way to run openocd and flash write command from the same
> > script. That is, I don't want to telnet to port  and run flash write
> > command. I would like to automate it.
> 
> Write a script:
> 
> init # initializes openocd
> flash  # do whatever you need to do
> 
> 
> 
> 

___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development