ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Joakim Tjernlund

Hi

I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS

Later Alan Cox pointed out that my changes makes x86 run slower and it turns
out that on x86 and a fairly new gcc will automatically unroll loops 'where 
appropriate'

Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile,
I got similar results as my hand coded unrolling (a little better).

I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY 
unrolling
unless you specify -funroll-loops. Doing this for the whole kernel is NOT a 
good idea,
it will run slower due to big increase of size.

Now I wonder:
Is this a gcc 2.95.3, PPC or Monta Vista limitation?

Which compiler will do unrolling 'where appropriate' for 8xx PPC and
Where can I get a precompiled version?

The short term solution is to specify -funroll-loops for individual 
files/directories.
Obviously JFFS2 should be included, but what else?


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Jaap-Jan Boor

Joakim,

Enable loop unrolling by default for 8xx processors is not
a good idea because of the limited instructions cache size.

I think that's also what is recommended in the ppc faq: enable
size optimization (-Os) for 8xx processors gives better performance.

For a 750 or so, it would be good to enable loop unrolling.

Jaap-Jan



 Hi

 I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
 the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS

 Later Alan Cox pointed out that my changes makes x86 run slower and it turns
 out that on x86 and a fairly new gcc will automatically unroll loops 'where 
 appropriate'

 Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
 Makefile,
 I got similar results as my hand coded unrolling (a little better).

 I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY 
 unrolling
 unless you specify -funroll-loops. Doing this for the whole kernel is NOT a 
 good idea,
 it will run slower due to big increase of size.

 Now I wonder:
 Is this a gcc 2.95.3, PPC or Monta Vista limitation?

 Which compiler will do unrolling 'where appropriate' for 8xx PPC and
 Where can I get a precompiled version?

 The short term solution is to specify -funroll-loops for individual 
 files/directories.
 Obviously JFFS2 should be included, but what else?





** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Joakim Tjernlund

Jaap-Jan,

Yes, it's not a good ide to enable -funroll-loops for the whole
kernel, but some functions should be unrolled anyway, like the crc32()
function since it won't increase the size very much but will yield
a significant speed increase.

So maybe the right way is to identify some loops and enable unrolling on them.

Where in the PPC FAQ did you read that -Os for 8xx processors?

 Joakim

 Joakim,

 Enable loop unrolling by default for 8xx processors is not
 a good idea because of the limited instructions cache size.

 I think that's also what is recommended in the ppc faq: enable
 size optimization (-Os) for 8xx processors gives better performance.

 For a 750 or so, it would be good to enable loop unrolling.

 Jaap-Jan


 
  Hi
 
  I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
  the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS
 
  Later Alan Cox pointed out that my changes makes x86 run slower and it turns
  out that on x86 and a fairly new gcc will automatically unroll loops 'where 
  appropriate'
 
  Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
  Makefile,
  I got similar results as my hand coded unrolling (a little better).
 
  I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do 
  ANY unrolling
  unless you specify -funroll-loops. Doing this for the whole kernel is NOT a 
  good idea,
  it will run slower due to big increase of size.
 
  Now I wonder:
  Is this a gcc 2.95.3, PPC or Monta Vista limitation?
 
  Which compiler will do unrolling 'where appropriate' for 8xx PPC and
  Where can I get a precompiled version?
 
  The short term solution is to specify -funroll-loops for individual 
  files/directories.
  Obviously JFFS2 should be included, but what else?
 
 
 
 



** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Jaap-Jan Boor


 Jaap-Jan,

 Yes, it's not a good ide to enable -funroll-loops for the whole
 kernel, but some functions should be unrolled anyway, like the crc32()
 function since it won't increase the size very much but will yield
 a significant speed increase.

 So maybe the right way is to identify some loops and enable unrolling on them.

yes (e.g. in fs/jffs2/Makefile)


 Where in the PPC FAQ did you read that -Os for 8xx processors?

http://penguinppc.org/embedded/howto/x1273.html#AEN1346


  Joakim

  Joakim,
 
  Enable loop unrolling by default for 8xx processors is not
  a good idea because of the limited instructions cache size.
 
  I think that's also what is recommended in the ppc faq: enable
  size optimization (-Os) for 8xx processors gives better performance.
 
  For a 750 or so, it would be good to enable loop unrolling.
 
  Jaap-Jan
 
 
  
   Hi
  
   I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
   the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS
  
   Later Alan Cox pointed out that my changes makes x86 run slower and it 
   turns
   out that on x86 and a fairly new gcc will automatically unroll loops 
   'where appropriate'
  
   Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
   Makefile,
   I got similar results as my hand coded unrolling (a little better).
  
   I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do 
   ANY unrolling
   unless you specify -funroll-loops. Doing this for the whole kernel is NOT 
   a good idea,
   it will run slower due to big increase of size.
  
   Now I wonder:
   Is this a gcc 2.95.3, PPC or Monta Vista limitation?
  
   Which compiler will do unrolling 'where appropriate' for 8xx PPC and
   Where can I get a precompiled version?
  
   The short term solution is to specify -funroll-loops for individual 
   files/directories.
   Obviously JFFS2 should be included, but what else?
  
  
  
  
 





** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Joakim Tjernlund

 
  Jaap-Jan,
 
  Yes, it's not a good ide to enable -funroll-loops for the whole
  kernel, but some functions should be unrolled anyway, like the crc32()
  function since it won't increase the size very much but will yield
  a significant speed increase.
 
  So maybe the right way is to identify some loops and enable unrolling on 
  them.

 yes (e.g. in fs/jffs2/Makefile)

Exactly, that's what I wrote earlier. But what else? Surely there are
other places where this would be a win?



 
  Where in the PPC FAQ did you read that -Os for 8xx processors?

 http://penguinppc.org/embedded/howto/x1273.html#AEN1346

Thanks, I give it a go.

   Joakim

 
   Joakim
 
   Joakim,
  
   Enable loop unrolling by default for 8xx processors is not
   a good idea because of the limited instructions cache size.
  
   I think that's also what is recommended in the ppc faq: enable
   size optimization (-Os) for 8xx processors gives better performance.
  
   For a 750 or so, it would be good to enable loop unrolling.
  
   Jaap-Jan
  
  
   
Hi
   
I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 
FS
   
Later Alan Cox pointed out that my changes makes x86 run slower and it 
turns
out that on x86 and a fairly new gcc will automatically unroll loops 
'where appropriate'
   
Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
Makefile,
I got similar results as my hand coded unrolling (a little better).
   
I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not 
do ANY unrolling
unless you specify -funroll-loops. Doing this for the whole kernel is 
NOT a good idea,
it will run slower due to big increase of size.
   
Now I wonder:
Is this a gcc 2.95.3, PPC or Monta Vista limitation?
   
Which compiler will do unrolling 'where appropriate' for 8xx PPC and
Where can I get a precompiled version?
   
The short term solution is to specify -funroll-loops for individual 
files/directories.
Obviously JFFS2 should be included, but what else?
   
   
   
   
  
 
 
 
 



** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Tom Rini

On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote:

 I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
 the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS

 Later Alan Cox pointed out that my changes makes x86 run slower and it turns
 out that on x86 and a fairly new gcc will automatically unroll loops 'where 
 appropriate'

 Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
 Makefile,
 I got similar results as my hand coded unrolling (a little better).

 I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY 
 unrolling
 unless you specify -funroll-loops. Doing this for the whole kernel is NOT a 
 good idea,
 it will run slower due to big increase of size.

I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will
unroll some loops with only -O2 since the info page on gcc-3.2 and
gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O
levels.

So I suspect someone decided that small loops can safely be unrolled on
i386 at some optimization level, but that same decision (with possibly
good reason) was not made for PPC32.  So it's a gcc feature, not a
MVista-specific issue.

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
[ disclaimer: I work for MVista. ]

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Joakim Tjernlund

 On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote:

  I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
  the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS
 
  Later Alan Cox pointed out that my changes makes x86 run slower and it turns
  out that on x86 and a fairly new gcc will automatically unroll loops 'where 
  appropriate'
 
  Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
  Makefile,
  I got similar results as my hand coded unrolling (a little better).
 
  I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do 
  ANY unrolling
  unless you specify -funroll-loops. Doing this for the whole kernel is NOT a 
  good idea,
  it will run slower due to big increase of size.

 I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will
 unroll some loops with only -O2 since the info page on gcc-3.2 and
 gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O
 levels.

 So I suspect someone decided that small loops can safely be unrolled on
 i386 at some optimization level, but that same decision (with possibly
 good reason) was not made for PPC32.  So it's a gcc feature, not a
 MVista-specific issue.

Newer gcc(=3.0) may do the same for PPC32. We only know that newer gcc's(Alan 
Cox knows more)
will do it for x86 and 2.95.3 for ppc_8xx won't, so there is a big ? in the 
middle.

Now to the trick question(s):
Where might it be suitable to add -funroll-loops or, better yet, can it be done
with a pragma or attribute attached to the function in question? It's pretty
hard to unroll inline functions otherwise (and only the inline function).


 --
 Tom Rini (TR1265)
 http://gate.crashing.org/~trini/
 [ disclaimer: I work for MVista. ]
Yeah, I know :-)

Jocke
PS.
 Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent earlier?


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Tom Rini

On Tue, Nov 12, 2002 at 05:09:11PM +0100, Joakim Tjernlund wrote:
  On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote:
 
   I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
   the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS
  
   Later Alan Cox pointed out that my changes makes x86 run slower and it 
   turns
   out that on x86 and a fairly new gcc will automatically unroll loops 
   'where appropriate'
  
   Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
   Makefile,
   I got similar results as my hand coded unrolling (a little better).
  
   I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do 
   ANY unrolling
   unless you specify -funroll-loops. Doing this for the whole kernel is NOT 
   a good idea,
   it will run slower due to big increase of size.
 
  I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will
  unroll some loops with only -O2 since the info page on gcc-3.2 and
  gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O
  levels.
 
  So I suspect someone decided that small loops can safely be unrolled on
  i386 at some optimization level, but that same decision (with possibly
  good reason) was not made for PPC32.  So it's a gcc feature, not a
  MVista-specific issue.

 Newer gcc(=3.0) may do the same for PPC32. We only know that newer 
 gcc's(Alan Cox knows more)
 will do it for x86 and 2.95.3 for ppc_8xx won't, so there is a big ? in the 
 middle.

Did Alan say what version of gcc Alan was talking about?

 Now to the trick question(s):
 Where might it be suitable to add -funroll-loops or, better yet, can it be 
 done
 with a pragma or attribute attached to the function in question? It's pretty
 hard to unroll inline functions otherwise (and only the inline function).

Well, to lib/Makefile:
ifeq ($(CONFIG_PPC32),y)
CFLAGS_crc32.o += -funroll-loops
endif

Should work.  And it's not unheard of.

  Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent 
 earlier?

As I said privatly, Dan Malek is handling the enet patch, and I'm
looking for time to do the i2c one.  Right now I'm working on making the
kernel easier to tweak (in some ways) for 2.5.

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Joakim Tjernlund

 On Tue, Nov 12, 2002 at 05:09:11PM +0100, Joakim Tjernlund wrote:
   On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote:
  
I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling
the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 
FS
   
Later Alan Cox pointed out that my changes makes x86 run slower and it 
turns
out that on x86 and a fairly new gcc will automatically unroll loops 
'where appropriate'
   
Removed my hand coded unrolling and added -funroll-loops to the JFFS2 
Makefile,
I got similar results as my hand coded unrolling (a little better).
   
I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not 
do ANY unrolling
unless you specify -funroll-loops. Doing this for the whole kernel is 
NOT a good idea,
it will run slower due to big increase of size.
  
   I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will
   unroll some loops with only -O2 since the info page on gcc-3.2 and
   gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O
   levels.
  
   So I suspect someone decided that small loops can safely be unrolled on
   i386 at some optimization level, but that same decision (with possibly
   good reason) was not made for PPC32.  So it's a gcc feature, not a
   MVista-specific issue.
 
  Newer gcc(=3.0) may do the same for PPC32. We only know that newer 
  gcc's(Alan Cox knows more)
  will do it for x86 and 2.95.3 for ppc_8xx won't, so there is a big ? in the 
  middle.

 Did Alan say what version of gcc Alan was talking about?

No, I did not ask at the time :-(


  Now to the trick question(s):
  Where might it be suitable to add -funroll-loops or, better yet, can it be 
  done
  with a pragma or attribute attached to the function in question? It's pretty
  hard to unroll inline functions otherwise (and only the inline function).

 Well, to lib/Makefile:
 ifeq ($(CONFIG_PPC32),y)
 CFLAGS_crc32.o += -funroll-loops
 endif

 Should work.  And it's not unheard of.

Yes, that much I already figured, but are there OTHER places in
the kernel that also might benefit from unrolling. I don't know the
kernel as well as you do and was hoping for a lead or two.


   Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent 
  earlier?

 As I said privatly, Dan Malek is handling the enet patch, and I'm
 looking for time to do the i2c one.  Right now I'm working on making the
 kernel easier to tweak (in some ways) for 2.5.

I know Dan is handling the enet stuff, but since you both work
for MV(don't you?) I figured you might know, being an insider and all :-)

Maybe your tweak stuff could make use of forced unrolling?

 Jocke


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Mark Hatle

 From one of our compiler guys in relationship to your original question:

  Later Alan Cox pointed out that my changes makes x86 run slower and it turns
  out that on x86 and a fairly new gcc will automatically unroll loops 'where
  appropriate'

Fairly new: 3.3-pre only.

Now I wonder:
  Is this a gcc 2.95.3, PPC or Monta Vista limitation?

2.95.3.

  Which compiler will do unrolling 'where appropriate' for 8xx PPC and
  Where can I get a precompiled version?

Nowhere, nor is using it for your kernels necessarily a good idea if
you don't have compiler experience.

 I know Dan is handling the enet stuff, but since you both work

Dan no longer works for MontaVista.  (And has not for some time now.)

 for MV(don't you?) I figured you might know, being an insider and all :-)

 Maybe your tweak stuff could make use of forced unrolling?

  Jocke


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling

2002-11-12 Thread Joakim Tjernlund

 On Tue, Nov 12, 2002 at 05:46:49PM +0100, Joakim Tjernlund wrote:

Now to the trick question(s):
Where might it be suitable to add -funroll-loops or, better yet, can it 
be done
with a pragma or attribute attached to the function in question? It's 
pretty
hard to unroll inline functions otherwise (and only the inline 
function).
  
   Well, to lib/Makefile:
   ifeq ($(CONFIG_PPC32),y)
   CFLAGS_crc32.o += -funroll-loops
   endif
  
   Should work.  And it's not unheard of.
 
  Yes, that much I already figured, but are there OTHER places in
  the kernel that also might benefit from unrolling. I don't know the
  kernel as well as you do and was hoping for a lead or two.

 Not really.  Unfortunatly what might be better is to figure out how it
 works on i386, and then figure out how to duplicate that logic on PPC,
 maybe making it another flag and then turned on for different sizes
 based on -mtune.

well, according to Mark Hatle it's only gcc 3.3-pre that does this kind of 
unrolling
automatically so that will take a very long time before it's figured out and 
usable.

So maybe it's worth the effort  to figure out a few hot spots and apply 
-funroll-loops there.

-mtune is new to me, need to look that one up.


 Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent 
earlier?
  
   As I said privatly, Dan Malek is handling the enet patch, and I'm
   looking for time to do the i2c one.  Right now I'm working on making the
   kernel easier to tweak (in some ways) for 2.5.
 
  I know Dan is handling the enet stuff, but since you both work
  for MV(don't you?) I figured you might know, being an insider and all :-)

 I don't follow you.  Dan doesn't work for MVista now.

Sorry, I didn't know that.


  Maybe your tweak stuff could make use of forced unrolling?

 Eventually, it could be used for turning it on or off for the whole
 kernel, or for a specific area even, once I get Makefile tweaks working.
 First I'm trying to get dependancies right.

makes sense.

Jocke

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/