Re: [U-Boot-Users] [PATCH] mips: Bring over optimized memset()routine from Linux.

2008-07-07 Thread McMullan, Jason
On Sun, 2008-07-06 at 00:32 +0200, Wolfgang Denk wrote:
 In message [EMAIL PROTECTED] you wrote:
  This commit pulls over the memset() MIPS routine from Linux 2.6.26,
  which provides a 10x to 20x speedup over the generic byte-at-a-time
  routine. This is especially useful on platforms with manual ECC
  scrubbing, that require all of memory to be written at least once
  after a power cycle.
 Do you intend to comment on the questions and/or submit a cleaned up
 version of the patch?

Unfortunately, no follow-up patch is forthcoming.

I was able to use a spare DMA engine on our SOC to perform the memory
zeroing, which eliminated the need for the enhanced memcopy() routine.

Also, I am not familiar with the intricacies of MIPS exception handling
for alignment issues, so I was not able to come up with a good solution
for Shinya Kuribayashi's alignment trap issue questions.

Please retract the patch.

Jason McMullan
MTS SW
System Firmware

NetApp
724.741.5011Fax
724.741.5166Direct
412.656.3519Mobile
[EMAIL PROTECTED]
www.netapp.com




signature.asc
Description: This is a digitally signed message part
-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08___
U-Boot-Users mailing list
U-Boot-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/u-boot-users


Re: [U-Boot-Users] [PATCH] mips: Bring over optimized memset() routine from Linux.

2008-06-13 Thread Shinya Kuribayashi
Shinya Kuribayashi wrote:
 +andia1, 0xff/* spread fillword */
 +LONG_SLLt1, a1, 8
 +or  a1, t1
 +LONG_SLLt1, a1, 16
 +#if LONGSIZE == 8
 +or  a1, t1
 +LONG_SLLt1, a1, 32
 +#endif
 +or  a1, t1
 +1:
 +
 +FEXPORT(__bzero)
 +sltiu   t0, a2, LONGSIZE/* very small region? */
 +bnezt0, .Lsmall_memset
 + andi   t0, a0, LONGMASK/* aligned? */
 
 ^
 
 [further part snipped]
 
 Please fix wrong indentations with proper tabs. I know this is exactly
 the same as Linux's memset, but we prefer to fix it correctly in U-Boot.

I found that above is an intended space to indicate that the instruction
is in the delay slot. I think it's probably a good old convention in
MIPS assembly programming, and would like to leave it as it is, IMHO.

Anyway, sorry for my ignorance and please ignore my comments on this.

-- 
Shinya Kuribayashi
NEC Electronics

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
___
U-Boot-Users mailing list
U-Boot-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/u-boot-users


Re: [U-Boot-Users] [PATCH] mips: Bring over optimized memset() routine from Linux.

2008-06-13 Thread Wolfgang Denk
In message [EMAIL PROTECTED] you wrote:

 I found that above is an intended space to indicate that the instruction
 is in the delay slot. I think it's probably a good old convention in
 MIPS assembly programming, and would like to leave it as it is, IMHO.

Indeed. If it has a deeper meaning, this should be left as is.

 Anyway, sorry for my ignorance and please ignore my comments on this.

Thanks for the explanation - I think most of us were not aware of any
such conventions. Speaking for me - I definitely was not.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
Time is fluid ... like a river with currents, eddies, backwash.
-- Spock, The City on the Edge of Forever, stardate 3134.0

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
___
U-Boot-Users mailing list
U-Boot-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/u-boot-users


Re: [U-Boot-Users] [PATCH] mips: Bring over optimized memset() routine from Linux.

2008-06-10 Thread Shinya Kuribayashi
Hi Jason,

Jason McMullan wrote:
 This commit pulls over the memset() MIPS routine from Linux 2.6.26,
 which provides a 10x to 20x speedup over the generic byte-at-a-time
 routine. This is especially useful on platforms with manual ECC
 scrubbing, that require all of memory to be written at least once
 after a power cycle.
 ---
  include/asm-mips/string.h |2 +-
  lib_mips/Makefile |2 +-
  lib_mips/memset.S |  174 
 +
  3 files changed, 176 insertions(+), 2 deletions(-)
  create mode 100644 lib_mips/memset.S

IIRC, Linux's memset relies on AdEL/AdES exceptions. We have Status.EXL
enabled, but don't have proper exception handlers, yet. So my question
is does this code always works expectedly, or works with some alignment
restriction?

And some nitpickings. See below.

 diff --git a/lib_mips/memset.S b/lib_mips/memset.S
 new file mode 100644
 index 000..f1c07d7
 --- /dev/null
 +++ b/lib_mips/memset.S
 @@ -0,0 +1,174 @@
 +/*
 + * This file is subject to the terms and conditions of the GNU General Public
 + * License.  See the file COPYING in the main directory of this archive
 + * for more details.
 + *
 + * Copyright (C) 1998, 1999, 2000 by Ralf Baechle
 + * Copyright (C) 1999, 2000 Silicon Graphics, Inc.
 + * Copyright (C) 2007  Maciej W. Rozycki
 + */
 +#include asm/asm.h
 +//#include asm/asm-offsets.h

Please remove unused #include. Even '#if 0'-ing is not allowed in
U-Boot policy.

 +#include asm/regdef.h
 +
 +#if LONGSIZE == 4
 +#define LONG_S_L swl
 +#define LONG_S_R swr
 +#else
 +#define LONG_S_L sdl
 +#define LONG_S_R sdr
 +#endif
 +
 +#define EX(insn,reg,addr,handler)\
 +9:   insnreg, addr;  \
 + .section __ex_table,a;\
 + PTR 9b, handler;\
 + .previous
 +
 + .macro  f_fill64 dst, offset, val, fixup
 + EX(LONG_S, \val, (\offset +  0 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  1 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  2 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  3 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  4 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  5 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  6 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  7 * LONGSIZE)(\dst), \fixup)
 +#if LONGSIZE == 4
 + EX(LONG_S, \val, (\offset +  8 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset +  9 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset + 10 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset + 11 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset + 12 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset + 13 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset + 14 * LONGSIZE)(\dst), \fixup)
 + EX(LONG_S, \val, (\offset + 15 * LONGSIZE)(\dst), \fixup)
 +#endif
 + .endm
 +
 +/*
 + * memset(void *s, int c, size_t n)
 + *
 + * a0: start of area to clear
 + * a1: char to fill with
 + * a2: size of area to clear
 + */
 + .setnoreorder
 + .align  5
 +LEAF(memset)
 + beqza1, 1f
 +  move   v0, a0  /* result */

^

 + andia1, 0xff/* spread fillword */
 + LONG_SLLt1, a1, 8
 + or  a1, t1
 + LONG_SLLt1, a1, 16
 +#if LONGSIZE == 8
 + or  a1, t1
 + LONG_SLLt1, a1, 32
 +#endif
 + or  a1, t1
 +1:
 +
 +FEXPORT(__bzero)
 + sltiu   t0, a2, LONGSIZE/* very small region? */
 + bnezt0, .Lsmall_memset
 +  andi   t0, a0, LONGMASK/* aligned? */

^

[further part snipped]

Please fix wrong indentations with proper tabs. I know this is exactly
the same as Linux's memset, but we prefer to fix it correctly in U-Boot.

[ I used to do like you did, but changed my mind. Now I think this is
  better practice. Incoherent indentations with Linux is not a big deal
  IMO. Just diff -w option blows them away. ]

Thanks in advance,

-- 
Shinya Kuribayashi
NEC Electronics

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
___
U-Boot-Users mailing list
U-Boot-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/u-boot-users


[U-Boot-Users] [PATCH] mips: Bring over optimized memset() routine from Linux.

2008-06-04 Thread Jason McMullan
This commit pulls over the memset() MIPS routine from Linux 2.6.26,
which provides a 10x to 20x speedup over the generic byte-at-a-time
routine. This is especially useful on platforms with manual ECC
scrubbing, that require all of memory to be written at least once
after a power cycle.
---
 include/asm-mips/string.h |2 +-
 lib_mips/Makefile |2 +-
 lib_mips/memset.S |  174 +
 3 files changed, 176 insertions(+), 2 deletions(-)
 create mode 100644 lib_mips/memset.S

diff --git a/include/asm-mips/string.h b/include/asm-mips/string.h
index 579a591..0df1463 100644
--- a/include/asm-mips/string.h
+++ b/include/asm-mips/string.h
@@ -27,7 +27,7 @@ extern int strcmp(__const__ char *__cs, __const__ char *__ct);
 #undef __HAVE_ARCH_STRNCMP
 extern int strncmp(__const__ char *__cs, __const__ char *__ct, __kernel_size_t 
__count);
 
-#undef __HAVE_ARCH_MEMSET
+#define __HAVE_ARCH_MEMSET
 extern void *memset(void *__s, int __c, __kernel_size_t __count);
 
 #undef __HAVE_ARCH_MEMCPY
diff --git a/lib_mips/Makefile b/lib_mips/Makefile
index 8176437..9149039 100644
--- a/lib_mips/Makefile
+++ b/lib_mips/Makefile
@@ -25,7 +25,7 @@ include $(TOPDIR)/config.mk
 
 LIB= $(obj)lib$(ARCH).a
 
-SOBJS-y+=
+SOBJS-y+= memset.o
 
 COBJS-y+= board.o
 COBJS-y+= bootm.o
diff --git a/lib_mips/memset.S b/lib_mips/memset.S
new file mode 100644
index 000..f1c07d7
--- /dev/null
+++ b/lib_mips/memset.S
@@ -0,0 +1,174 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file COPYING in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 1998, 1999, 2000 by Ralf Baechle
+ * Copyright (C) 1999, 2000 Silicon Graphics, Inc.
+ * Copyright (C) 2007  Maciej W. Rozycki
+ */
+#include asm/asm.h
+//#include asm/asm-offsets.h
+#include asm/regdef.h
+
+#if LONGSIZE == 4
+#define LONG_S_L swl
+#define LONG_S_R swr
+#else
+#define LONG_S_L sdl
+#define LONG_S_R sdr
+#endif
+
+#define EX(insn,reg,addr,handler)  \
+9: insnreg, addr;  \
+   .section __ex_table,a;\
+   PTR 9b, handler;\
+   .previous
+
+   .macro  f_fill64 dst, offset, val, fixup
+   EX(LONG_S, \val, (\offset +  0 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  1 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  2 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  3 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  4 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  5 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  6 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  7 * LONGSIZE)(\dst), \fixup)
+#if LONGSIZE == 4
+   EX(LONG_S, \val, (\offset +  8 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset +  9 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset + 10 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset + 11 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset + 12 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset + 13 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset + 14 * LONGSIZE)(\dst), \fixup)
+   EX(LONG_S, \val, (\offset + 15 * LONGSIZE)(\dst), \fixup)
+#endif
+   .endm
+
+/*
+ * memset(void *s, int c, size_t n)
+ *
+ * a0: start of area to clear
+ * a1: char to fill with
+ * a2: size of area to clear
+ */
+   .setnoreorder
+   .align  5
+LEAF(memset)
+   beqza1, 1f
+move   v0, a0  /* result */
+
+   andia1, 0xff/* spread fillword */
+   LONG_SLLt1, a1, 8
+   or  a1, t1
+   LONG_SLLt1, a1, 16
+#if LONGSIZE == 8
+   or  a1, t1
+   LONG_SLLt1, a1, 32
+#endif
+   or  a1, t1
+1:
+
+FEXPORT(__bzero)
+   sltiu   t0, a2, LONGSIZE/* very small region? */
+   bnezt0, .Lsmall_memset
+andi   t0, a0, LONGMASK/* aligned? */
+
+#ifndef CONFIG_CPU_DADDI_WORKAROUNDS
+   beqzt0, 1f
+PTR_SUBU   t0, LONGSIZE/* alignment in bytes */
+#else
+   .setnoat
+   li  AT, LONGSIZE
+   beqzt0, 1f
+PTR_SUBU   t0, AT  /* alignment in bytes */
+   .setat
+#endif
+
+   R10KCBARRIER(0(ra))
+#ifdef __MIPSEB__
+   EX(LONG_S_L, a1, (a0), .Lfirst_fixup)   /* make word/dword aligned */
+#endif
+#ifdef __MIPSEL__
+   EX(LONG_S_R, a1, (a0), .Lfirst_fixup)   /* make word/dword aligned */
+#endif
+   PTR_SUBUa0, t0  /* long align ptr */
+   PTR_ADDUa2, t0  /* correct size */
+
+1: ori