Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-08 Thread Mathias K.

Hello,

On 05.02.2011 10:43, Øyvind Harboe wrote:

What sort of CPU did you run the tests on?


Which test? The target cpu/mcu or my system cpu?



Let me know when the patch is ready to be committed. I suppose
it could need a bit of coolof .


I think its fine.


Regards,

Mathias
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-08 Thread Øyvind Harboe
On Tue, Feb 8, 2011 at 9:09 AM, Mathias K. kes...@freenet.de wrote:
 Hello,

 On 05.02.2011 10:43, Øyvind Harboe wrote:

 What sort of CPU did you run the tests on?

 Which test? The target cpu/mcu or my system cpu?

System CPU.

 Let me know when the patch is ready to be committed. I suppose
 it could need a bit of coolof .

 I think its fine.

OK. There has been a lot of discussion back and forth
about the wonders of optimization, but you're the only
one who's submitted a patch, so I'll commit that first.

-- 
Øyvind Harboe

Can Zylin Consulting help on your project?

US toll free 1-866-980-3434 / International +47 51 87 40 27

http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-08 Thread simon qian
This code should be better optimized.

But, shouldn't it be:
sb = src_start / 8;
db = dst_start / 8;
sq = src_start % 8;
dq = dst_start % 8;
src += sb;
dst += db;

2011/2/8 Øyvind Harboe oyvind.har...@zylin.com

 Merged.

 Thanks!



 --
 Øyvind Harboe

 Can Zylin Consulting help on your project?

 US toll free 1-866-980-3434 / International +47 51 87 40 27

 http://www.zylin.com/zy1000.html
 ARM7 ARM9 ARM11 XScale Cortex
 JTAG debugger and flash programmer
 ___
 Openocd-development mailing list
 Openocd-development@lists.berlios.de
 https://lists.berlios.de/mailman/listinfo/openocd-development




-- 
Best Regards, SimonQian
http://www.SimonQian.com
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-08 Thread Øyvind Harboe
On Tue, Feb 8, 2011 at 11:29 AM, simon qian simonqian.open...@gmail.com wrote:
 This code should be better optimized.

Patches welcome!

 But, shouldn't it be:
     sb = src_start / 8;
     db = dst_start / 8;
     sq = src_start % 8;
     dq = dst_start % 8;
     src += sb;
     dst += db;

Isn't in master branch?

I committed the wrong version and then the correct one(hopefully!)



-- 
Øyvind Harboe

Can Zylin Consulting help on your project?

US toll free 1-866-980-3434 / International +47 51 87 40 27

http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-08 Thread simon qian
I check the master branch, it's fixed an hour ago.

2011/2/8 Øyvind Harboe oyvind.har...@zylin.com

 On Tue, Feb 8, 2011 at 11:48 AM, simon qian simonqian.open...@gmail.com
 wrote:
  It's not in the branch.
  Here is the patch.

 Are you sure?

 I tried to apply your patch and it failed.

 Could you rebase your branch on top of the master
 branch?



 --
 Øyvind Harboe

 Can Zylin Consulting help on your project?

 US toll free 1-866-980-3434 / International +47 51 87 40 27

 http://www.zylin.com/zy1000.html
 ARM7 ARM9 ARM11 XScale Cortex
 JTAG debugger and flash programmer




-- 
Best Regards, SimonQian
http://www.SimonQian.com
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-07 Thread Øyvind Harboe
What impact would it have to make this an
inline fn?

How often are unsigned src_start, unsigned,
dst_start, unsigned len constants that could be
dealt with through constant propagation and code
elimination?


-- 
Øyvind Harboe

Can Zylin Consulting help on your project?

US toll free 1-866-980-3434 / International +47 51 87 40 27

http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-07 Thread Marc Pignat
Hi all!

This seems a good idea, and you can even improve the copy doing someting like
this (pseudo-code only):

void* buf_set_buf(const void *_src, unsigned src_start,
void *_dst, unsigned dst_start, unsigned len)
{
/* Are src and dst bit aligned? */
if ((dst_start % 8) != (src_start % 8))
{
/* No - bit to bit copy */


return ;
}

/* Manage non-byte data at the beginning */
while (((dst_start % 8) != 0)  len)
{
/* bit to bit copy */
}

/* we've got 2 byte-aligned buffers ;)*/
memcpy...

/* Manage non-byte data at the end */
while (len)
{
/* bit to bit copy */
}

}
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-07 Thread Mathias K.

Hello,

On 07.02.2011 09:09, Øyvind Harboe wrote:

What impact would it have to make this an
inline fn?


I think there is no need to declare this big function as inline. This will only increase the code 
size.


I see some functions in the jtag/interface.c file with a very small body that could be declared as 
inline because they are called very very often:


tap_set_state_impl
tap_get_state
tap_set_end_state
tap_get_end_state


Regards,

Mathias
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-07 Thread Marc Pignat
On Monday 07 February 2011 09:09:36 Øyvind Harboe wrote:
 What impact would it have to make this an
 inline fn?
 
...
 
Inlined or not, this function could be faster.

Even with inlining and constant propagation, I don't think gcc is smart enough
to replace buf_set_buf(_src, 0, dst, 0, 64); by memcpy(_src, dst, 8);

Marc
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-07 Thread Øyvind Harboe
On Mon, Feb 7, 2011 at 9:50 AM, Marc Pignat marc.pig...@hevs.ch wrote:
 On Monday 07 February 2011 09:09:36 Øyvind Harboe wrote:
 What impact would it have to make this an
 inline fn?

 ...

 Inlined or not, this function could be faster.

 Even with inlining and constant propagation, I don't think gcc is smart enough
 to replace buf_set_buf(_src, 0, dst, 0, 64); by memcpy(_src, dst, 8);

Sure it is! Just set up an if() statement that it can
optimize and execute memcpy() in that case!

With -O3 it can do a lot of constant propagation and
figure out that an if() is always taken, especially if
the args to if() are constant.


-- 
Øyvind Harboe

Can Zylin Consulting help on your project?

US toll free 1-866-980-3434 / International +47 51 87 40 27

http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-07 Thread Michael Schwingen
Am 02/07/2011 09:37 AM, schrieb Øyvind Harboe:
 On Mon, Feb 7, 2011 at 9:27 AM, Mathias K. kes...@freenet.de wrote:
 Hello,

 On 07.02.2011 09:09, Øyvind Harboe wrote:
 What impact would it have to make this an
 inline fn?
 I think there is no need to declare this big function as inline. This will
 only increase the code size.
 To the point where it matters?

 I think clarity and performance matters much more
 than code-size, right?
On what kind of system?

On embedded CPUs with small caches, inlining will usually slow down the
code because less of it fits inside the code cache - on those systems,
-Os is usually faster than the more aggressive optimization levels that
do more inlining.

I vote to keep the code readable and only inline when there is a real
*noticeable* gain, not just because profiling shows that it is faster.
(that does not mean I object against the original patch: speeding up the
implementation by optimizing the code is fine, as long as it does not
hamper maintainability).

cu
Michael


___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-05 Thread Øyvind Harboe
What sort of CPU did you run the tests on?

Let me know when the patch is ready to be committed. I suppose
it could need a bit of coolof .

-- 
Øyvind Harboe
US toll free 1-866-980-3434 / International +47 51 63 25 00
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


[Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-04 Thread Mathias K.

Hello,

this patch increase the speed of the buf_set_buf function around 30%.


Regards,

Mathias
diff --git a/src/helper/binarybuffer.c b/src/helper/binarybuffer.c
index 3a16cce..e789e6f 100644
--- a/src/helper/binarybuffer.c
+++ b/src/helper/binarybuffer.c
@@ -133,19 +133,34 @@ void* buf_set_buf(const void *_src, unsigned src_start,
 {
const uint8_t *src = _src;
uint8_t *dst = _dst;
+   unsigned  sb,db,sq,dq;
+
+   sb = src_start / 8;
+   db = dst_start / 8;
+   sq = src_start % 8;
+   dq = dst_start % 8;
 
-   unsigned src_idx = src_start, dst_idx = dst_start;
for (unsigned i = 0; i  len; i++)
{
-   if (((src[src_idx / 8]  (src_idx % 8))  1) == 1)
-   dst[dst_idx / 8] |= 1  (dst_idx % 8);
+   if (((*src  (sq7))  1) == 1)
+   *dst |= 1  (dq7);
else
-   dst[dst_idx / 8] = ~(1  (dst_idx % 8));
-   dst_idx++;
-   src_idx++;
+   *dst = ~(1  (dq7));
+
+   if ( sq++ == 7 )
+   {
+   sq = 0;
+   src++;
+   }
+
+   if ( dq++ == 7 )
+   {
+   dq = 0;
+   dst++;
+   }
}
 
-   return dst;
+   return (uint8_t*)_dst;
 }
 
 uint32_t flip_u32(uint32_t value, unsigned int num)
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-04 Thread Øyvind Harboe
On Fri, Feb 4, 2011 at 5:21 PM, Mathias K. kes...@freenet.de wrote:
 Hello,

 this patch increase the speed of the buf_set_buf function around 30%.

how do you arrive at 30%?

What overall impact does this have?

-- 
Øyvind Harboe

Can Zylin Consulting help on your project?

US toll free 1-866-980-3434 / International +47 51 87 40 27

http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-04 Thread Mathias K.

Hello,

okay the patch has a little bug. I have not set the correct start pointer of the input and output 
buffer.

Also i have checked the input of this function and in many cases a simple byte 
copy is possible.
I have added this check now and is it possible the buffer is copied byte by 
byte and not bit by bit.


With byte boundary input the test looks like this:

buf_set_buf 0x0200 iteration test:
runtime (seconds): old: 6.828559 new: 0.436191 diff: 6.392368
runtime (seconds): old: 6.853636 new: 0.430389 diff: 6.423247
runtime (seconds): old: 6.794985 new: 0.423065 diff: 6.371920


Without:

buf_set_buf 0x0200 iteration test:
runtime (seconds): old: 6.370869 new: 5.552624 diff: 0.818245
runtime (seconds): old: 6.420730 new: 5.665887 diff: 0.754843
runtime (seconds): old: 6.583306 new: 5.599021 diff: 0.984285



Regards,

Mathias
diff --git a/src/helper/binarybuffer.c b/src/helper/binarybuffer.c
index 3a16cce..e789e6f 100644
--- a/src/helper/binarybuffer.c
+++ b/src/helper/binarybuffer.c
@@ -133,19 +133,34 @@ void* buf_set_buf(const void *_src, unsigned src_start,
 {
const uint8_t *src = _src;
uint8_t *dst = _dst;
+   unsigned  sb,db,sq,dq;
+
+   sb = src_start / 8;
+   db = dst_start / 8;
+   sq = src_start % 8;
+   dq = dst_start % 8;
 
-   unsigned src_idx = src_start, dst_idx = dst_start;
for (unsigned i = 0; i  len; i++)
{
-   if (((src[src_idx / 8]  (src_idx % 8))  1) == 1)
-   dst[dst_idx / 8] |= 1  (dst_idx % 8);
+   if (((*src  (sq7))  1) == 1)
+   *dst |= 1  (dq7);
else
-   dst[dst_idx / 8] = ~(1  (dst_idx % 8));
-   dst_idx++;
-   src_idx++;
+   *dst = ~(1  (dq7));
+
+   if ( sq++ == 7 )
+   {
+   sq = 0;
+   src++;
+   }
+
+   if ( dq++ == 7 )
+   {
+   dq = 0;
+   dst++;
+   }
}
 
-   return dst;
+   return (uint8_t*)_dst;
 }
 
 uint32_t flip_u32(uint32_t value, unsigned int num)
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development


Re: [Openocd-development] [PATCH] buf_set_buf around 30% speed increase

2011-02-04 Thread Mathias K.

sorry, wrong file ...

diff --git a/src/helper/binarybuffer.c b/src/helper/binarybuffer.c
index 3a16cce..08e149a 100644
--- a/src/helper/binarybuffer.c
+++ b/src/helper/binarybuffer.c
@@ -133,19 +133,48 @@ void* buf_set_buf(const void *_src, unsigned src_start,
 {
const uint8_t *src = _src;
uint8_t *dst = _dst;
+   unsigned  i,sb,db,sq,dq, lb,lq;
+
+   sb = src_start / 8;
+   db = dst_start / 8;
+   sq = src_start % 8;
+   dq = dst_start % 8;
+   lb = len / 8;
+   lq = len % 8;
+
+   src += sb;
+   dst += db;
+
+   /* check if both buffers are on byte boundary and
+* len is a multiple of 8bit so we can simple copy
+* the buffer */
+   if ( (sq == 0)  (dq == 0)   (lq == 0) )
+   {
+   for (i = 0; i  lb; i++)
+   *dst++ = *src++;
+   return (uint8_t*)_dst;
+   }
 
-   unsigned src_idx = src_start, dst_idx = dst_start;
-   for (unsigned i = 0; i  len; i++)
+   /* fallback to slow bit copy */
+   for (i = 0; i  len; i++)
{
-   if (((src[src_idx / 8]  (src_idx % 8))  1) == 1)
-   dst[dst_idx / 8] |= 1  (dst_idx % 8);
+   if (((*src  (sq7))  1) == 1)
+   *dst |= 1  (dq7);
else
-   dst[dst_idx / 8] = ~(1  (dst_idx % 8));
-   dst_idx++;
-   src_idx++;
+   *dst = ~(1  (dq7));
+   if ( sq++ == 7 )
+   {
+   sq = 0;
+   src++;
+   }
+   if ( dq++ == 7 )
+   {
+   dq = 0;
+   dst++;
+   }
}
 
-   return dst;
+   return (uint8_t*)_dst;
 }
 
 uint32_t flip_u32(uint32_t value, unsigned int num)
___
Openocd-development mailing list
Openocd-development@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/openocd-development