[Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-17 Thread Øyvind Harboe
I'm pondering how we could gently in a series of
non-breaking patches prepare the ground for switching from
8 to 32 bit words in the jtag_add_xxx API.

The attached patch gets rid of buf_set_u32() when setting
the value of a byte.

This achieves two things: the code is less obtuse and it
is more evident how we could introduce a new type
that is *currently* uint8_t and later on could be increased to
uint32_t or  wider, for the out_value/in_value bit vectors.

Comments? Protests?



-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
From 869589a654316c8fceddbf98629cdfebb21512ef Mon Sep 17 00:00:00 2001
From: =?utf-8?q?=C3=98yvind=20Harboe?= 
Date: Tue, 17 Nov 2009 21:59:01 +0100
Subject: [PATCH] jtag-api: get rid of unecessary buf_set_u23() that make code obtuse.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

Also, this is on the path to increasing the word size for
bit vectors from 8 to something wider(32? natural host machine
width?)

Signed-off-by: Øyvind Harboe 
---
 src/target/embeddedice.c |   21 ++---
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/target/embeddedice.c b/src/target/embeddedice.c
index e375475..3947e26 100644
--- a/src/target/embeddedice.c
+++ b/src/target/embeddedice.c
@@ -349,7 +349,7 @@ int embeddedice_read_reg_w_check(struct reg *reg,
 	fields[1].tap = ice_reg->jtag_info->tap;
 	fields[1].num_bits = 5;
 	fields[1].out_value = field1_out;
-	buf_set_u32(fields[1].out_value, 0, 5, reg_addr);
+	fields[1].out_value[0] = reg_addr;
 	fields[1].in_value = NULL;
 	fields[1].check_value = NULL;
 	fields[1].check_mask = NULL;
@@ -358,7 +358,7 @@ int embeddedice_read_reg_w_check(struct reg *reg,
 	fields[2].tap = ice_reg->jtag_info->tap;
 	fields[2].num_bits = 1;
 	fields[2].out_value = field2_out;
-	buf_set_u32(fields[2].out_value, 0, 1, 0);
+	fields[2].out_value[0] = 0;
 	fields[2].in_value = NULL;
 	fields[2].check_value = NULL;
 	fields[2].check_mask = NULL;
@@ -375,7 +375,7 @@ int embeddedice_read_reg_w_check(struct reg *reg,
 	 * EICE_COMMS_DATA would read the register twice
 	 * reading the control register is safe
 	 */
-	buf_set_u32(fields[1].out_value, 0, 5, eice_regs[EICE_COMMS_CTRL].addr);
+	fields[1].out_value[0] = eice_regs[EICE_COMMS_CTRL].addr;
 
 	/* traverse Update-DR, reading but with no other side effects */
 	jtag_add_dr_scan_check(3, fields, jtag_get_end_state());
@@ -409,13 +409,13 @@ int embeddedice_receive(struct arm_jtag *jtag_info, uint32_t *data, uint32_t siz
 	fields[1].tap = jtag_info->tap;
 	fields[1].num_bits = 5;
 	fields[1].out_value = field1_out;
-	buf_set_u32(fields[1].out_value, 0, 5, eice_regs[EICE_COMMS_DATA].addr);
+	fields[1].out_value[0] = eice_regs[EICE_COMMS_DATA].addr;
 	fields[1].in_value = NULL;
 
 	fields[2].tap = jtag_info->tap;
 	fields[2].num_bits = 1;
 	fields[2].out_value = field2_out;
-	buf_set_u32(fields[2].out_value, 0, 1, 0);
+	fields[2].out_value[0] = 0;
 	fields[2].in_value = NULL;
 
 	jtag_add_dr_scan(3, fields, jtag_get_end_state());
@@ -426,8 +426,7 @@ int embeddedice_receive(struct arm_jtag *jtag_info, uint32_t *data, uint32_t siz
 		 * to avoid reading additional data from the DCC data reg
 		 */
 		if (size == 1)
-			buf_set_u32(fields[1].out_value, 0, 5,
-	eice_regs[EICE_COMMS_CTRL].addr);
+			fields[1].out_value[0] = eice_regs[EICE_COMMS_CTRL].addr;
 
 		fields[0].in_value = (uint8_t *)data;
 		jtag_add_dr_scan(3, fields, jtag_get_end_state());
@@ -531,13 +530,13 @@ int embeddedice_send(struct arm_jtag *jtag_info, uint32_t *data, uint32_t size)
 	fields[1].tap = jtag_info->tap;
 	fields[1].num_bits = 5;
 	fields[1].out_value = field1_out;
-	buf_set_u32(fields[1].out_value, 0, 5, eice_regs[EICE_COMMS_DATA].addr);
+	fields[1].out_value[0] = eice_regs[EICE_COMMS_DATA].addr;
 	fields[1].in_value = NULL;
 
 	fields[2].tap = jtag_info->tap;
 	fields[2].num_bits = 1;
 	fields[2].out_value = field2_out;
-	buf_set_u32(fields[2].out_value, 0, 1, 1);
+	fields[2].out_value[0] = 1;
 
 	fields[2].in_value = NULL;
 
@@ -587,13 +586,13 @@ int embeddedice_handshake(struct arm_jtag *jtag_info, int hsbit, uint32_t timeou
 	fields[1].tap = jtag_info->tap;
 	fields[1].num_bits = 5;
 	fields[1].out_value = field1_out;
-	buf_set_u32(fields[1].out_value, 0, 5, eice_regs[EICE_COMMS_DATA].addr);
+	fields[1].out_value[0] = eice_regs[EICE_COMMS_DATA].addr;
 	fields[1].in_value = NULL;
 
 	fields[2].tap = jtag_info->tap;
 	fields[2].num_bits = 1;
 	fields[2].out_value = field2_out;
-	buf_set_u32(fields[2].out_value, 0, 1, 0);
+	fields[2].out_value[0] = 0;
 	fields[2].in_value = NULL;
 
 	jtag_add_dr_scan(3, fields, jtag_get_end_state());
-- 
1.6.3.3

___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


[Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Laurent Gauch
>
> I'm pondering how we could gently in a series of
> non-breaking patches prepare the ground for switching from
> 8 to 32 bit words in the jtag_add_xxx API.
>
> The attached patch gets rid of buf_set_u32() when setting
> the value of a byte.
>
> This achieves two things: the code is less obtuse and it
> is more evident how we could introduce a new type
> that is *currently* uint8_t and later on could be increased to
> uint32_t or  wider, for the out_value/in_value bit vectors.
>
> Comments? Protests?

JTAG serial link itself has a notion of bits and not bytes nor dwords ...

I do not understand what is the advantage to work on 32bit buffers 
instead 8bit buffers for out_value and in_value.
Why the code will be less obtuse use 32bit buffer instead 8 bit buffers ?

Maybe I wrong, but I think there are really no advantage to change this.

Laurent
  http://www.amontec.com



___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Øyvind Harboe
On Wed, Nov 18, 2009 at 9:38 AM, Laurent Gauch
 wrote:
>>
>> I'm pondering how we could gently in a series of
>> non-breaking patches prepare the ground for switching from
>> 8 to 32 bit words in the jtag_add_xxx API.
>>
>> The attached patch gets rid of buf_set_u32() when setting
>> the value of a byte.
>>
>> This achieves two things: the code is less obtuse and it
>> is more evident how we could introduce a new type
>> that is *currently* uint8_t and later on could be increased to
>> uint32_t or  wider, for the out_value/in_value bit vectors.
>>
>> Comments? Protests?
>
> JTAG serial link itself has a notion of bits and not bytes nor dwords ...
>
> I do not understand what is the advantage to work on 32bit buffers
> instead 8bit buffers for out_value and in_value.
> Why the code will be less obtuse use 32bit buffer instead 8 bit buffers ?

Look at all the buf_set_u32()'s sprinkled around the code. They are essentially
unnecessary.

The drivers probably wouldn't change.

-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Michael Bruck
I would suggest removing the fields completely from that layer and
replacing them with function calls. For the most common types of data
like uint32_t.

scan_start_dr();

scan_tap(struct jtag_tap);
scan_field_u32_w(size_t bits, uint32_t value);
scan_field_u32_wr(size_t bits, uint32_t value, uint32_t * result);
scan_field_buf_w(size_t bits, const void * buf);
...
scan_end();
jtag_execute_queue();

etc.

The layer below takes all these neatly constructed and allocated
fields and copies them anyway. You'd have to switch from an array of
fields to a linked list internally, but overall the code would be
cleaner.


Michael


On Tue, Nov 17, 2009 at 22:01, Øyvind Harboe  wrote:
> I'm pondering how we could gently in a series of
> non-breaking patches prepare the ground for switching from
> 8 to 32 bit words in the jtag_add_xxx API.
>
> The attached patch gets rid of buf_set_u32() when setting
> the value of a byte.
>
> This achieves two things: the code is less obtuse and it
> is more evident how we could introduce a new type
> that is *currently* uint8_t and later on could be increased to
> uint32_t or  wider, for the out_value/in_value bit vectors.
>
> Comments? Protests?
>
>
>
> --
> Øyvind Harboe
> http://www.zylin.com/zy1000.html
> ARM7 ARM9 ARM11 XScale Cortex
> JTAG debugger and flash programmer
>
> ___
> Openocd-development mailing list
> Openocd-development@lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/openocd-development
>
>
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Øyvind Harboe
On Wed, Nov 18, 2009 at 2:53 PM, Michael Bruck  wrote:
> I would suggest removing the fields completely from that layer and
> replacing them with function calls. For the most common types of data
> like uint32_t.
>
> scan_start_dr();
>
> scan_tap(struct jtag_tap);
> scan_field_u32_w(size_t bits, uint32_t value);
> scan_field_u32_wr(size_t bits, uint32_t value, uint32_t * result);
> scan_field_buf_w(size_t bits, const void * buf);
> ...
> scan_end();
> jtag_execute_queue();
>
> etc.
>
> The layer below takes all these neatly constructed and allocated
> fields and copies them anyway. You'd have to switch from an array of
> fields to a linked list internally, but overall the code would be
> cleaner.

I don't see the entire API you are propsing, but it would be very inefficient
and completely break with the current model.

The current API has a couple of things going for it with being efficient on
both low performance cpu low latency and high performance cpu long latency
scenarios.

But some variant & helper fn's along the lines above would make
a lot of sense.

>
>
> Michael
>
>
> On Tue, Nov 17, 2009 at 22:01, Øyvind Harboe  wrote:
>> I'm pondering how we could gently in a series of
>> non-breaking patches prepare the ground for switching from
>> 8 to 32 bit words in the jtag_add_xxx API.
>>
>> The attached patch gets rid of buf_set_u32() when setting
>> the value of a byte.
>>
>> This achieves two things: the code is less obtuse and it
>> is more evident how we could introduce a new type
>> that is *currently* uint8_t and later on could be increased to
>> uint32_t or  wider, for the out_value/in_value bit vectors.
>>
>> Comments? Protests?
>>
>>
>>
>> --
>> Øyvind Harboe
>> http://www.zylin.com/zy1000.html
>> ARM7 ARM9 ARM11 XScale Cortex
>> JTAG debugger and flash programmer
>>
>> ___
>> Openocd-development mailing list
>> Openocd-development@lists.berlios.de
>> https://lists.berlios.de/mailman/listinfo/openocd-development
>>
>>
>



-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Michael Bruck
On Wed, Nov 18, 2009 at 15:12, Øyvind Harboe  wrote:
> On Wed, Nov 18, 2009 at 2:53 PM, Michael Bruck  wrote:
>> I would suggest removing the fields completely from that layer and
>> replacing them with function calls. For the most common types of data
>> like uint32_t.
>>
>> scan_start_dr();
>>
>> scan_tap(struct jtag_tap);
>> scan_field_u32_w(size_t bits, uint32_t value);
>> scan_field_u32_wr(size_t bits, uint32_t value, uint32_t * result);
>> scan_field_buf_w(size_t bits, const void * buf);
>> ...
>> scan_end();
>> jtag_execute_queue();
>>
>> etc.
>>
>> The layer below takes all these neatly constructed and allocated
>> fields and copies them anyway. You'd have to switch from an array of
>> fields to a linked list internally, but overall the code would be
>> cleaner.
>
> I don't see the entire API you are propsing, but it would be very inefficient
> and completely break with the current model.
>
> The current API has a couple of things going for it with being efficient on
> both low performance cpu low latency and high performance cpu long latency
> scenarios.

To the contrary, it would be faster. When fully implemented it removes
the step that clones all the data in driver.c.

> But some variant & helper fn's along the lines above would make
> a lot of sense.

Yes


I would actually prefer an API that is tightly linked to an
independent data structure that that builds up a jtag sequence in the
target driver and then executes it. All the commands would then work
on building up that structure and in the end it is handed over
directly to the jtag interface driver for execution.

The current model does the same but essentially uses a global
variable. But presumably due to ownership issues the field data is
cloned for that global variable. If the jtag sequence structure is
owned by the target this (second) copy operation can be avoided as
well.


Michael
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Øyvind Harboe
On Wed, Nov 18, 2009 at 3:32 PM, Michael Bruck  wrote:
> On Wed, Nov 18, 2009 at 15:12, Øyvind Harboe  wrote:
>> On Wed, Nov 18, 2009 at 2:53 PM, Michael Bruck  wrote:
>>> I would suggest removing the fields completely from that layer and
>>> replacing them with function calls. For the most common types of data
>>> like uint32_t.
>>>
>>> scan_start_dr();
>>>
>>> scan_tap(struct jtag_tap);
>>> scan_field_u32_w(size_t bits, uint32_t value);
>>> scan_field_u32_wr(size_t bits, uint32_t value, uint32_t * result);
>>> scan_field_buf_w(size_t bits, const void * buf);
>>> ...
>>> scan_end();
>>> jtag_execute_queue();
>>>
>>> etc.
>>>
>>> The layer below takes all these neatly constructed and allocated
>>> fields and copies them anyway. You'd have to switch from an array of
>>> fields to a linked list internally, but overall the code would be
>>> cleaner.
>>
>> I don't see the entire API you are propsing, but it would be very inefficient
>> and completely break with the current model.
>>
>> The current API has a couple of things going for it with being efficient on
>> both low performance cpu low latency and high performance cpu long latency
>> scenarios.
>
> To the contrary, it would be faster. When fully implemented it removes
> the step that clones all the data in driver.c.

Actually, the minidrivers don't clone today, so that's already taken care of.

The USB drivers have a 1ms roundtrip problem to contend with, the
rest is in the noise, essentially. There is plenty of evidence to this effect.

Also we have to *carefully* consider how we can make *small* steps that
can be tested on all the hardware combinations. Otherwise any change
is unlikely to pay off. We're getting there but it will and should take time.

-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Michael Bruck
On Wed, Nov 18, 2009 at 15:35, Øyvind Harboe  wrote:
> On Wed, Nov 18, 2009 at 3:32 PM, Michael Bruck  wrote:
>> On Wed, Nov 18, 2009 at 15:12, Øyvind Harboe  wrote:
>>> On Wed, Nov 18, 2009 at 2:53 PM, Michael Bruck  wrote:
 I would suggest removing the fields completely from that layer and
 replacing them with function calls. For the most common types of data
 like uint32_t.

 scan_start_dr();

 scan_tap(struct jtag_tap);
 scan_field_u32_w(size_t bits, uint32_t value);
 scan_field_u32_wr(size_t bits, uint32_t value, uint32_t * result);
 scan_field_buf_w(size_t bits, const void * buf);
 ...
 scan_end();
 jtag_execute_queue();

 etc.

 The layer below takes all these neatly constructed and allocated
 fields and copies them anyway. You'd have to switch from an array of
 fields to a linked list internally, but overall the code would be
 cleaner.
>>>
>>> I don't see the entire API you are propsing, but it would be very 
>>> inefficient
>>> and completely break with the current model.
>>>
>>> The current API has a couple of things going for it with being efficient on
>>> both low performance cpu low latency and high performance cpu long latency
>>> scenarios.
>>
>> To the contrary, it would be faster. When fully implemented it removes
>> the step that clones all the data in driver.c.
>
> Actually, the minidrivers don't clone today, so that's already taken care of.

Doesn't that apply only to zy1000.c ?

> The USB drivers have a 1ms roundtrip problem to contend with, the
> rest is in the noise, essentially. There is plenty of evidence to this effect.

*You* brought up performance as concern... My goal was primarily
streamlining code.

Your example seemed to be too much of a workaround rather than
addressing the problem directly, which IMO is that the uint32_t case
is so common that there should be a short unambiguous way to deal with
it.

Of course even a set of standard wrappers to package the set-up of the
most common field configurations would help a lot, without also
serving as a shortcut into the minidriver-layer to avoid the copying.

> Also we have to *carefully* consider how we can make *small* steps that
> can be tested on all the hardware combinations. Otherwise any change
> is unlikely to pay off. We're getting there but it will and should take time.

It was a suggestion on long-term goals. Small steps are usually most
effective when they go towards a specific target.


Michael
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Øyvind Harboe
>> Actually, the minidrivers don't clone today, so that's already taken care of.
>
> Doesn't that apply only to zy1000.c ?

The USB case needs to delay execution to build a long
scan, so there copy is required.

>> The USB drivers have a 1ms roundtrip problem to contend with, the
>> rest is in the noise, essentially. There is plenty of evidence to this 
>> effect.
>
> *You* brought up performance as concern... My goal was primarily
> streamlining code.

I did. We need both streamlined and fast code. Until we have both sorted,
we leave things the way they are.

> Your example seemed to be too much of a workaround rather than
> addressing the problem directly, which IMO is that the uint32_t case
> is so common that there should be a short unambiguous way to deal with
> it.
>
> Of course even a set of standard wrappers to package the set-up of the
> most common field configurations would help a lot, without also
> serving as a shortcut into the minidriver-layer to avoid the copying.

I've got some thoughts on how to do this, but nothing written up yet.

>> Also we have to *carefully* consider how we can make *small* steps that
>> can be tested on all the hardware combinations. Otherwise any change
>> is unlikely to pay off. We're getting there but it will and should take time.
>
> It was a suggestion on long-term goals. Small steps are usually most
> effective when they go towards a specific target.

I'm doing small steps in a branch for now. We'll see how it pans out.

That branch might end up being a big leap before it is pushed to
the master branch.


-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-18 Thread Michael Bruck
On Wed, Nov 18, 2009 at 16:17, Øyvind Harboe  wrote:
>>> Actually, the minidrivers don't clone today, so that's already taken care 
>>> of.
>>
>> Doesn't that apply only to zy1000.c ?
>
> The USB case needs to delay execution to build a long
> scan, so there copy is required.

I think you misunderstood what I am proposing. I did not suggest to
execute the sequence immediately. I am not only aware of the USB
latency, in fact the arm11 code relies on it for the burst mode.

>>> The USB drivers have a 1ms roundtrip problem to contend with, the
>>> rest is in the noise, essentially. There is plenty of evidence to this 
>>> effect.
>>
>> *You* brought up performance as concern... My goal was primarily
>> streamlining code.
>
> I did. We need both streamlined and fast code. Until we have both sorted,
> we leave things the way they are.

My proposal does not imply any decrease in speed, it can lead to an
increase. It still builds up a jtag sequence before executing it, it
just modifies the interface for value types.

>> Your example seemed to be too much of a workaround rather than
>> addressing the problem directly, which IMO is that the uint32_t case
>> is so common that there should be a short unambiguous way to deal with
>> it.
>>
>> Of course even a set of standard wrappers to package the set-up of the
>> most common field configurations would help a lot, without also
>> serving as a shortcut into the minidriver-layer to avoid the copying.
>
> I've got some thoughts on how to do this, but nothing written up yet.
>
>>> Also we have to *carefully* consider how we can make *small* steps that
>>> can be tested on all the hardware combinations. Otherwise any change
>>> is unlikely to pay off. We're getting there but it will and should take 
>>> time.
>>
>> It was a suggestion on long-term goals. Small steps are usually most
>> effective when they go towards a specific target.
>
> I'm doing small steps in a branch for now. We'll see how it pans out.
>
> That branch might end up being a big leap before it is pushed to
> the master branch.

Michael
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-19 Thread Andreas Fritiofson
On Thu, Nov 19, 2009 at 1:03 PM, Øyvind Harboe  wrote:
> Was this for the list?

Yeah it was, that Reply-to-all button seems to be hard to find sometimes.

>
> On Thu, Nov 19, 2009 at 12:34 AM, Andreas Fritiofson
>  wrote:
>> On Wed, Nov 18, 2009 at 9:40 AM, Øyvind Harboe  
>> wrote:
>>> On Wed, Nov 18, 2009 at 9:38 AM, Laurent Gauch
>>>  wrote:
>
> I'm pondering how we could gently in a series of
> non-breaking patches prepare the ground for switching from
> 8 to 32 bit words in the jtag_add_xxx API.
>
> The attached patch gets rid of buf_set_u32() when setting
> the value of a byte.
>
> This achieves two things: the code is less obtuse and it
> is more evident how we could introduce a new type
> that is *currently* uint8_t and later on could be increased to
> uint32_t or  wider, for the out_value/in_value bit vectors.
>
> Comments? Protests?

 JTAG serial link itself has a notion of bits and not bytes nor dwords ...

 I do not understand what is the advantage to work on 32bit buffers
 instead 8bit buffers for out_value and in_value.
 Why the code will be less obtuse use 32bit buffer instead 8 bit buffers ?
>>>
>>> Look at all the buf_set_u32()'s sprinkled around the code. They are 
>>> essentially
>>> unnecessary.
>>>
>>> The drivers probably wouldn't change.
>>
>> I don't see the point in deciding on a specific width of the storage
>> unit. The JTAG layer (should?) handle bit strings of arbitrary
>> lenghts, so why not abstract away how the bit strings are stored
>> internally? Some time ago someone (David?) suggested we borrow/build
>> on the bitmap facility from Linux. I remember someone had strong
>> opinions against it, don't remember why. I myself think it's a good
>> idea to migrate towards a similar solution rather than switching from
>> one arbitrary, fixed width to another.
>>
>> As a side note, the Linux' bitmap implementation actually uses
>> 'unsigned long' as storage, so if using 32 bits is your design
>> requirement you'll get it on at least some platforms. :)
>>
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-19 Thread David Brownell
On Wednesday 18 November 2009, Laurent Gauch wrote:
> I do not understand what is the advantage to work on 32bit buffers 
> instead 8bit buffers for out_value and in_value.

For starters, in some contexts it's faster by a factor of more than
four ... one instruction moving N bits, not four (and then there are
second-order effects too.  :)

For another thing, it's a clean way to help get rid of warnings like this:

arm_jtag.h: In function 'arm7flip32':
arm_jtag.h:48: error: cast increases required alignment of target type
arm_jtag.h: In function 'arm_le_to_h_u32':
arm_jtag.h:54: error: cast increases required alignment of target type

___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-19 Thread David Brownell
On Wednesday 18 November 2009, Michael Bruck wrote:
> I would actually prefer an API that is tightly linked to an
> independent data structure that that builds up a jtag sequence in the
> target driver and then executes it. All the commands would then work
> on building up that structure and in the end it is handed over
> directly to the jtag interface driver for execution.

That presumes there *is* a target.  We have SVF and XSVF today,
and there can be other tools that work more at the JTAG level.

Plus, it's a bit awkward to be coupling address space access
(memory read/write) to targets, considering that newer ARMs
decouple them (via ADIv5) and so does, ISTR, Nexus.

I'm not entirely sure what you mean to describe though.  I'd
like to see less baroque structures for the JTAG messages,
and one part of that is clearly a need for more efficient
bit vector handling.

Along those lines, $SUBJECT is the wrong model too.  When we
want to work with 64 bit or 128 bit words -- or even just
the 40-bit words common with older ARM stuff -- then we
should be able to just pass those down.  Thinking "8" or
"32" focusses on the wrong stuff ... implementation details,
not the concepts that will produce a better interface.


> The current model does the same but essentially uses a global
> variable. But presumably due to ownership issues the field data is
> cloned for that global variable. If the jtag sequence structure is
> owned by the target this (second) copy operation can be avoided as
> well.

I think I agree with what you're saying there.  JTAG layer
gets delivered a queue, which it processes and modifies
in place (if required).

For extra credit:  some way to package queues with a bit
of intelligence, so they could be downloaded into smarter
controllers.  Example, the polling loops which run in
the background ... or between key steps of high level
operations.

- Dave

___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-20 Thread Michael Bruck
On Thu, Nov 19, 2009 at 22:01, David Brownell  wrote:
> On Wednesday 18 November 2009, Michael Bruck wrote:
>> I would actually prefer an API that is tightly linked to an
>> independent data structure that that builds up a jtag sequence in the
>> target driver and then executes it. All the commands would then work
>> on building up that structure and in the end it is handed over
>> directly to the jtag interface driver for execution.
>
> That presumes there *is* a target.  We have SVF and XSVF today,
> and there can be other tools that work more at the JTAG level.
>
> Plus, it's a bit awkward to be coupling address space access
> (memory read/write) to targets, considering that newer ARMs
> decouple them (via ADIv5) and so does, ISTR, Nexus.

This is a misunderstanding due to my imprecise choice of words. I used
target layer to refer to anything that uses the core.c interface to
send JTAG commands. My comment above meant exactly the same thing as
the one you quoted at the end, which I think you understood correctly.

> I'm not entirely sure what you mean to describe though.  I'd
> like to see less baroque structures for the JTAG messages,
> and one part of that is clearly a need for more efficient
> bit vector handling.
>
> Along those lines, $SUBJECT is the wrong model too.  When we
> want to work with 64 bit or 128 bit words -- or even just
> the 40-bit words common with older ARM stuff -- then we
> should be able to just pass those down.  Thinking "8" or
> "32" focusses on the wrong stuff ... implementation details,
> not the concepts that will produce a better interface.

Yes I agree.

>> The current model does the same but essentially uses a global
>> variable. But presumably due to ownership issues the field data is
>> cloned for that global variable. If the jtag sequence structure is
>> owned by the target this (second) copy operation can be avoided as
>> well.
>
> I think I agree with what you're saying there.  JTAG layer
> gets delivered a queue, which it processes and modifies
> in place (if required).
>
> For extra credit:  some way to package queues with a bit
> of intelligence, so they could be downloaded into smarter
> controllers.  Example, the polling loops which run in
> the background ... or between key steps of high level
> operations.
>

Just to clarify the whole issue once more, my proposal was actually
three different things:

1. Making the use of scan_field safer by providing standard handlers
for the most common cases.

This not only helps with the readability and reduces trivial
copy&paste errors. It also makes it much simpler to rewire the
underlying scan_field in a later step.


2. Eliminating the global variable jtag_command_queue.

The existing jtag_add_... commands would remain similar but would
operate on a local copy of the queue. jtag_execute_queue then receives
the pointer to that local copy as parameter instead of using
jtag_command_queue. The last user then disposes of the command queue.

The advantage here is a cleaner modular approach. For example this
makes scripting complex JTAG sequences possible without worrying about
interference from polling. In theory this also allows for the use of
multiple JTAG interfaces. With this approach it is also possible to
offer an asynchronous jtag execution mode (if someone needs such a
nightmare).


3. Break up jtag_add_dr_scan etc.

This works best in tandem with (2). The general idea is not to pass
one array of scan fields but to pass them in separate function calls
(which would mimic, but replace the ones in (1)). To output a 7 bit
field the caller just hands the value to the function and doesn't
bother about allocating space. To turn jtag_add_?r_scan inside out
like this requires its states to be kept somewhere so that
plausibility checks and bypassing can be done. The local copy of the
jtag_command_queue would be ideal for that (although it would also
work by adding even more global variables). The caller then does
something like this:

jtag_queue_t * q = jq_alloc_queue();

jq_statemove(q, TAP_IDLE); /* This is just a placeholder for a good
solution to deal with the fact that the initial TAP state is not known
until jq_execute().  */

jq_set_tap(q, my_tap); /* set the tap to be implied by the following
functions, until the next jq_set_tap() */

jq_ir_scan_start(q);

jq_field_u32_out(q, 5, some_instruction);

jq_dr_scan_start(q);

jq_field_u32_out(q, 7, my_value);
jq_field_u32_outin(q, 3, another_value, &return_value);


jq_execute(q); /* callee frees q and associated data */



Michael
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-20 Thread Øyvind Harboe
On Thu, Nov 19, 2009 at 10:01 PM, David Brownell  wrote:
> On Wednesday 18 November 2009, Michael Bruck wrote:
>> I would actually prefer an API that is tightly linked to an
>> independent data structure that that builds up a jtag sequence in the
>> target driver and then executes it. All the commands would then work
>> on building up that structure and in the end it is handed over
>> directly to the jtag interface driver for execution.
>
> That presumes there *is* a target.  We have SVF and XSVF today,
> and there can be other tools that work more at the JTAG level.
>
> Plus, it's a bit awkward to be coupling address space access
> (memory read/write) to targets, considering that newer ARMs
> decouple them (via ADIv5) and so does, ISTR, Nexus.
>
> I'm not entirely sure what you mean to describe though.  I'd
> like to see less baroque structures for the JTAG messages,
> and one part of that is clearly a need for more efficient
> bit vector handling.
>
> Along those lines, $SUBJECT is the wrong model too.  When we
> want to work with 64 bit or 128 bit words -- or even just
> the 40-bit words common with older ARM stuff -- then we
> should be able to just pass those down.  Thinking "8" or
> "32" focusses on the wrong stuff ... implementation details,
> not the concepts that will produce a better interface.

I'm working on an API which focuses on values(any size),
rather than a particular bit width.

Part of what I wrote went the way of my harddrive on my
laptop while I was travelling these last few days...



> For extra credit:  some way to package queues with a bit
> of intelligence, so they could be downloaded into smarter
> controllers.  Example, the polling loops which run in
> the background ... or between key steps of high level
> operations.

There is very little evidence to the effect that it would be
worth the extra effort and complication to be more clever
than today about handling long latencies.

My main goal is to make the code a bit crisper and clearer.


-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-20 Thread Øyvind Harboe
> Just to clarify the whole issue once more, my proposal was actually
> three different things:
>
> 1. Making the use of scan_field safer by providing standard handlers
> for the most common cases.
>
> This not only helps with the readability and reduces trivial
> copy&paste errors. It also makes it much simpler to rewire the
> underlying scan_field in a later step.

We can relatively easily do away with the scanfields entirely
rather than putting lipstick on it... See the branch I was working on
or jtag_add_dr_out() API for which I'm looking into adding a in_value
as well as the existing "out_value" that it takes today.

> 2. Eliminating the global variable jtag_command_queue.
>
> The existing jtag_add_... commands would remain similar but would
> operate on a local copy of the queue. jtag_execute_queue then receives
> the pointer to that local copy as parameter instead of using
> jtag_command_queue. The last user then disposes of the command queue.

This assumes that there is a queue at all. It should be up to the interface
to implement a queue if that is what is needed.

The current JTAG calls should be invoked on the interface rather
than assuming a single global instance. This is covered by todo.txt
section today as an architectural improvement.

> 3. Break up jtag_add_dr_scan etc.
>
> This works best in tandem with (2). The general idea is not to pass
> one array of scan fields but to pass them in separate function calls
> (which would mimic, but replace the ones in (1)). To output a 7 bit
> field the caller just hands the value to the function and doesn't
> bother about allocating space. To turn jtag_add_?r_scan inside out
> like this requires its states to be kept somewhere so that
> plausibility checks and bypassing can be done. The local copy of the
> jtag_command_queue would be ideal for that (although it would also
> work by adding even more global variables). The caller then does
> something like this:
>
> jtag_queue_t * q = jq_alloc_queue();

I'm very much against *forcing* interfaces to implement a queue
in memory. It should be possible to execute the commands
synchronously. The existance of a queue would make the code
*much* less efficient on embedded devices.


Did you look at jtag_add_dr_out() which exists today?


-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-20 Thread Michael Bruck
On Fri, Nov 20, 2009 at 19:05, Øyvind Harboe  wrote:
>> Just to clarify the whole issue once more, my proposal was actually
>> three different things:
>>
>> 1. Making the use of scan_field safer by providing standard handlers
>> for the most common cases.
>>
>> This not only helps with the readability and reduces trivial
>> copy&paste errors. It also makes it much simpler to rewire the
>> underlying scan_field in a later step.
>
> We can relatively easily do away with the scanfields entirely
> rather than putting lipstick on it... See the branch I was working on
> or jtag_add_dr_out() API for which I'm looking into adding a in_value
> as well as the existing "out_value" that it takes today.

This still uses an array that needs to be initialized. Your own
embeddedice example patch didn't make use of it...
And by adding out_value the caller is forced to hold all data in
uint32_t. My suggestion was intended to make (almost) all data types
first class citizens, not just choosing one that is the most commonly
used.

>> 2. Eliminating the global variable jtag_command_queue.
>>
>> The existing jtag_add_... commands would remain similar but would
>> operate on a local copy of the queue. jtag_execute_queue then receives
>> the pointer to that local copy as parameter instead of using
>> jtag_command_queue. The last user then disposes of the command queue.
>
> This assumes that there is a queue at all. It should be up to the interface
> to implement a queue if that is what is needed.
>
> The current JTAG calls should be invoked on the interface rather
> than assuming a single global instance. This is covered by todo.txt
> section today as an architectural improvement.

These are two things. The global JTAG device instance and the global
command sequence. The TODO is specific on the device instances, but
not on command queues.

>> 3. Break up jtag_add_dr_scan etc.
>>
>> This works best in tandem with (2). The general idea is not to pass
>> one array of scan fields but to pass them in separate function calls
>> (which would mimic, but replace the ones in (1)). To output a 7 bit
>> field the caller just hands the value to the function and doesn't
>> bother about allocating space. To turn jtag_add_?r_scan inside out
>> like this requires its states to be kept somewhere so that
>> plausibility checks and bypassing can be done. The local copy of the
>> jtag_command_queue would be ideal for that (although it would also
>> work by adding even more global variables). The caller then does
>> something like this:
>>
>> jtag_queue_t * q = jq_alloc_queue();
>
> I'm very much against *forcing* interfaces to implement a queue
> in memory. It should be possible to execute the commands
> synchronously. The existance of a queue would make the code
> *much* less efficient on embedded devices.

Where is the bottleneck in this case? Latency, memory consumption, cpu
load or something else?

> Did you look at jtag_add_dr_out() which exists today?

git format-patch d14b6ca0^...d14b6ca0


Michael
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-21 Thread Øyvind Harboe
On Fri, Nov 20, 2009 at 10:43 PM, Michael Bruck  wrote:
> On Fri, Nov 20, 2009 at 19:05, Øyvind Harboe  wrote:
>>> Just to clarify the whole issue once more, my proposal was actually
>>> three different things:
>>>
>>> 1. Making the use of scan_field safer by providing standard handlers
>>> for the most common cases.
>>>
>>> This not only helps with the readability and reduces trivial
>>> copy&paste errors. It also makes it much simpler to rewire the
>>> underlying scan_field in a later step.
>>
>> We can relatively easily do away with the scanfields entirely
>> rather than putting lipstick on it... See the branch I was working on
>> or jtag_add_dr_out() API for which I'm looking into adding a in_value
>> as well as the existing "out_value" that it takes today.
>
> This still uses an array that needs to be initialized. Your own
> embeddedice example patch didn't make use of it...

The example on my harddrive that has been put to pasture makes
use of it :-)

> And by adding out_value the caller is forced to hold all data in
> uint32_t. My suggestion was intended to make (almost) all data types
> first class citizens, not just choosing one that is the most commonly
> used.

The thing that "forces" the users to use 32 bit is the fact that the
*target* is 32 bit. It doesn't matter what word size the OpenOCD host
CPU is using in the approach I'm suggesting. It works equally well
with any *target* word size.


> These are two things. The global JTAG device instance and the global
> command sequence. The TODO is specific on the device instances, but
> not on command queues.

The whole point is that whether or not that there *is* an actual
command sequence is something that it is up to the interface
to implement. Today the interface has that freedom. By exposing
the queue explicitly in the calling API you remove the ability to
the interface to drop the implementation of a queue.

The current JTAG API allows for a hardware queue, which is super
efficient.

> Where is the bottleneck in this case? Latency, memory consumption, cpu
> load or something else?

The *current* jtag_add_dr_out() implementation breaks down to *two pokes*
if you have a hardware JTAG queue.

*Anything* you add on top of that is going to increase performance overhead
siginficantly :-)

To get a sense of perspective, a single malloc() is orders of magnitude more
work that a synchronous jtag_add_dr_out() w/a hardware queue.

I've profiled this extensively, so it's no coincidence that I ended up with
the particular jtag_add_dr_out() syntax. Notice that the first argument
is a constant so with GCC's constant forward propagation, jtag_add_dr_out()
really *can* boil down to poking two hardware registers...


>> Did you look at jtag_add_dr_out() which exists today?
>
> git format-patch d14b6ca0^...d14b6ca0

I don't know what you mean by the above against which repository.


-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-21 Thread Michael Schwingen
Øyvind Harboe wrote:
>> 3. Break up jtag_add_dr_scan etc.
>>
>> This works best in tandem with (2). The general idea is not to pass
>> one array of scan fields but to pass them in separate function calls
>> (which would mimic, but replace the ones in (1)). To output a 7 bit
>> field the caller just hands the value to the function and doesn't
>> bother about allocating space. To turn jtag_add_?r_scan inside out
>> like this requires its states to be kept somewhere so that
>> plausibility checks and bypassing can be done. The local copy of the
>> jtag_command_queue would be ideal for that (although it would also
>> work by adding even more global variables). The caller then does
>> something like this:
>>
>> jtag_queue_t * q = jq_alloc_queue();
>> 
>
> I'm very much against *forcing* interfaces to implement a queue
> in memory. It should be possible to execute the commands
> synchronously. The existance of a queue would make the code
> *much* less efficient on embedded devices.
>   
Hm. I do not have a complete view of the proposed API, but does it
really *force* you to implement a queue?

It looks to me you only have to have a queue structure, which is used to
hold state across multiple calls (maybe it should have a different
name?). You don't *have* to queue the commands, right?

So if you have a hardware queue, the queue struct may be mostly unused
by your interface implementation, and every add_... call simply stashes
the arguments into your hardware queue.

Please correct me if I overlooked something (which is entirely possible).

cu
Michael


___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-21 Thread Øyvind Harboe
On Sat, Nov 21, 2009 at 2:09 PM, Michael Schwingen
 wrote:
> Øyvind Harboe wrote:
>>> 3. Break up jtag_add_dr_scan etc.
>>>
>>> This works best in tandem with (2). The general idea is not to pass
>>> one array of scan fields but to pass them in separate function calls
>>> (which would mimic, but replace the ones in (1)). To output a 7 bit
>>> field the caller just hands the value to the function and doesn't
>>> bother about allocating space. To turn jtag_add_?r_scan inside out
>>> like this requires its states to be kept somewhere so that
>>> plausibility checks and bypassing can be done. The local copy of the
>>> jtag_command_queue would be ideal for that (although it would also
>>> work by adding even more global variables). The caller then does
>>> something like this:
>>>
>>> jtag_queue_t * q = jq_alloc_queue();
>>>
>>
>> I'm very much against *forcing* interfaces to implement a queue
>> in memory. It should be possible to execute the commands
>> synchronously. The existance of a queue would make the code
>> *much* less efficient on embedded devices.
>>
> Hm. I do not have a complete view of the proposed API, but does it
> really *force* you to implement a queue?

The important point is that the queuing of JTAG commands
can be done in hardware today, with no overhead.

We want to keep the actual queue implementation something
completely internal to the interface implementations.



-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-21 Thread Michael Schwingen
Øyvind Harboe wrote:
> The important point is that the queuing of JTAG commands
> can be done in hardware today, with no overhead.
>
> We want to keep the actual queue implementation something
> completely internal to the interface implementations.
>   
Understood. I still do not see how the proposed API would break that.

cu
Michael


___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-25 Thread David Brownell
On Friday 20 November 2009, Øyvind Harboe wrote:
> I'm very much against *forcing* interfaces to implement a queue
> in memory. It should be possible to execute the commands
> synchronously. The existance of a queue would make the code
> *much* less efficient on embedded devices.

We saw contrary feedback not that long ago, though:  that
a lot of time was being wasted recreating the same command
queues repeatedly.

Why would running from a queue cause inefficiency?  I've
often seen the opposite.  Minimally, having the commands
already lined up and ready to go means there's not going
to be time wasted between steps, calculating the next one.
So there's an immediate latency win.

At some level, executing from a prepared queue, rather than
computing C[i] and executing X[i] the steps one at a time,
is just time shifting:

  sigma(i, C[i]) + sigma(i, X[i]) == sigma(i, C[i] + X[i])

However, splitting the compute and execute stages lets
them be individually optimized ... and also opens the
door to the "reuse the queue" mode, reducing the time
for subsequent queue runs to just sigma(i, X[i]).  That
means more of those operations can be performed in the
same time ... e.g. polling a register.  (If the C and
X costs are the same, polling would be twice as fast.)

- Dave

___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-25 Thread David Brownell
> Just to clarify the whole issue once more, my proposal was actually
> three different things:

These seem like good directions to explore.

I'll suggest the post-0.4.0 development cycle (January++) as a
good time to have mergeable code that starts reworking any of
this stuff.  I don't think 0.4.0 is appropriate for this type
of changes, since the relevant issues aren't fully understood
and I can't see having those discussions backed by stable code
before the expected RC1 code freeze (in under a month).


> 1. Making the use of scan_field safer by providing standard handlers
> for the most common cases.
> 
> This not only helps with the readability and reduces trivial
> copy&paste errors. It also makes it much simpler to rewire the
> underlying scan_field in a later step.

... this part might, however, be a thing that could start
phasing in sooner.
 
 
> 2. Eliminating the global variable jtag_command_queue.
> 
> The existing jtag_add_... commands would remain similar but would
> operate on a local copy of the queue. jtag_execute_queue then receives
> the pointer to that local copy as parameter instead of using
> jtag_command_queue. The last user then disposes of the command queue.
> 
> The advantage here is a cleaner modular approach. For example this
> makes scripting complex JTAG sequences possible without worrying about
> interference from polling. In theory this also allows for the use of
> multiple JTAG interfaces. With this approach it is also possible to
> offer an asynchronous jtag execution mode (if someone needs such a
> nightmare).

This calls for a kind of flag day.  If it's agreed to be a good
thing, having that ready to merge once the next merge window opens
seems like it'd be a good strategy:  there's enough time between
now and then to develop and stabilize most of that code.

Øyvind would for example want to make sure he can create queues
that are optimized for his the Zylin hardware...

Re asynch ...  we're in userspace, so the model should be "threads".
And the background polling might better be modeled as threads than
through a timer interrupt.  But that brings in portability problems.

- Dave


> 3. Break up jtag_add_dr_scan etc.
> 
> This works best in tandem with (2). The general idea is not to pass
> one array of scan fields but to pass them in separate function calls
> (which would mimic, but replace the ones in (1)). To output a 7 bit
> field the caller just hands the value to the function and doesn't
> bother about allocating space. To turn jtag_add_?r_scan inside out
> like this requires its states to be kept somewhere so that
> plausibility checks and bypassing can be done. The local copy of the
> jtag_command_queue would be ideal for that (although it would also
> work by adding even more global variables). The caller then does
> something like this:
> 
> jtag_queue_t * q = jq_alloc_queue();
> 
> jq_statemove(q, TAP_IDLE); /* This is just a placeholder for a good
> solution to deal with the fact that the initial TAP state is not known
> until jq_execute().  */
> 
> jq_set_tap(q, my_tap); /* set the tap to be implied by the following
> functions, until the next jq_set_tap() */
> 
> jq_ir_scan_start(q);
> 
> jq_field_u32_out(q, 5, some_instruction);
> 
> jq_dr_scan_start(q);
> 
> jq_field_u32_out(q, 7, my_value);
> jq_field_u32_outin(q, 3, another_value, &return_value);
> 
> 
> jq_execute(q); /* callee frees q and associated data */
> 
> 
> 
> Michael
> 
> 


___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-25 Thread David Brownell
On Friday 20 November 2009, Øyvind Harboe wrote:
> > 2. Eliminating the global variable jtag_command_queue.
> >
> > The existing jtag_add_... commands would remain similar but would
> > operate on a local copy of the queue. jtag_execute_queue then receives
> > the pointer to that local copy as parameter instead of using
> > jtag_command_queue. The last user then disposes of the command queue.
> 
> This assumes that there is a queue at all. It should be up to the interface
> to implement a queue if that is what is needed.

Today's interface is built around the concept of a queue.

The thing which permits synchronous execution is having
that queue be implicit, with a handful of explicit flush
points called jtag_execute_queue().

It might be worth exploring which way to go.  Nothing is
currently reusing the queues -- it's not possible since
they are hidden!! -- but as others have pointed out, it'd
be a win to be able to do that instead of constantly needing
to reallocat and reinitialize the same command queues.


> The current JTAG calls should be invoked on the interface rather
> than assuming a single global instance. This is covered by todo.txt
> section today as an architectural improvement.

That's a separable issue.  I personally don't have an issue
with expecting one OpenOCD process to manage only a single
scan chain, through a single JTAG adapter/interface.  It's
certainly the common case ... and it seems to me like useful
simplification, rather than oversimplification.

- Dave
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-25 Thread Øyvind Harboe
On Wed, Nov 25, 2009 at 8:54 PM, David Brownell  wrote:
> On Friday 20 November 2009, Ųyvind Harboe wrote:
>> > 2. Eliminating the global variable jtag_command_queue.
>> >
>> > The existing jtag_add_... commands would remain similar but would
>> > operate on a local copy of the queue. jtag_execute_queue then receives
>> > the pointer to that local copy as parameter instead of using
>> > jtag_command_queue. The last user then disposes of the command queue.
>>
>> This assumes that there is a queue at all. It should be up to the interface
>> to implement a queue if that is what is needed.
>
> Today's interface is built around the concept of a queue.

ZY1000 has a hardware queue or FIFO depending on jargon.
The current OpenOCD model allows the queue to be implemented
either in hardware or in software.

Now the most used OpenOCD minidrvier is *dreadfully* inefficient, but
it doesn't have to be that way. It would be perfectly possible to rewrite
the minidriver for driver for the USB dongles to build queues more
along the lines that the zy1000 does.

However do some profiling and you'll discover that it's a red herring.
The only thing that matters on a PC is to reduce the # of roundtrips
for USB. *Possibly* it might be worthwhile to send of USB commands
*asynchronously* to building them. E.g. when doing a DCC upload,
then *nothing* goes out of the USB port until the *entire* DCC sequence
(megabytes possibly) is put together. Admittedly on a PC it probably
takes << 1 sec to create that queue so there is little or nothing to
be gained.


> The thing which permits synchronous execution is having
> that queue be implicit, with a handful of explicit flush
> points called jtag_execute_queue().

As is documented the jtag_execute_queue() command serves
another *crucial* role. It returns an error if any of the jtag_add
commands asynchronous or synchronous caused an error.
All the jtag_add_xxx() commands return void(there is an
exception as always but that may go).

> It might be worth exploring which way to go.  Nothing is
> currently reusing the queues -- it's not possible since
> they are hidden!! -- but as others have pointed out, it'd
> be a win to be able to do that instead of constantly needing
> to reallocat and reinitialize the same command queues.

This is a misunderstanding and a red herring. Look at the
definition of jtag_add_dr_out()(exists and used today)
and jtag_add_dr() in the oharboe/jtag32api branch.

If you have a hardware fifo, then you can be looking at
*two pokes* to execute a  jtag_add_dr_out(). Really!

If you need asynchronous execution, then two
pokes to execute a jtag_add_dr_out() just can't
be beat.  Look at the arguments to jtag_add_dr_out(),
they are variable so there is nothing that can be
"built once" there.


-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] Slowly moving from 8 to 32 bit words in jtag_add_xxx API

2009-11-25 Thread Øyvind Harboe
I would beg of you that you study jtag_add_dr() and jtag_add_dr_out()
in oharboe/jtag32api *before* you go down the road of thinkg about
how to put lipstick on the current fields structures...

Especially jtag_add_dr_out() has a track record of being wickedly
efficient.




-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development