ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
Hi I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. Now I wonder: Is this a gcc 2.95.3, PPC or Monta Vista limitation? Which compiler will do unrolling 'where appropriate' for 8xx PPC and Where can I get a precompiled version? The short term solution is to specify -funroll-loops for individual files/directories. Obviously JFFS2 should be included, but what else? ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
Joakim, Enable loop unrolling by default for 8xx processors is not a good idea because of the limited instructions cache size. I think that's also what is recommended in the ppc faq: enable size optimization (-Os) for 8xx processors gives better performance. For a 750 or so, it would be good to enable loop unrolling. Jaap-Jan Hi I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. Now I wonder: Is this a gcc 2.95.3, PPC or Monta Vista limitation? Which compiler will do unrolling 'where appropriate' for 8xx PPC and Where can I get a precompiled version? The short term solution is to specify -funroll-loops for individual files/directories. Obviously JFFS2 should be included, but what else? ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
Jaap-Jan, Yes, it's not a good ide to enable -funroll-loops for the whole kernel, but some functions should be unrolled anyway, like the crc32() function since it won't increase the size very much but will yield a significant speed increase. So maybe the right way is to identify some loops and enable unrolling on them. Where in the PPC FAQ did you read that -Os for 8xx processors? Joakim Joakim, Enable loop unrolling by default for 8xx processors is not a good idea because of the limited instructions cache size. I think that's also what is recommended in the ppc faq: enable size optimization (-Os) for 8xx processors gives better performance. For a 750 or so, it would be good to enable loop unrolling. Jaap-Jan Hi I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. Now I wonder: Is this a gcc 2.95.3, PPC or Monta Vista limitation? Which compiler will do unrolling 'where appropriate' for 8xx PPC and Where can I get a precompiled version? The short term solution is to specify -funroll-loops for individual files/directories. Obviously JFFS2 should be included, but what else? ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
Jaap-Jan, Yes, it's not a good ide to enable -funroll-loops for the whole kernel, but some functions should be unrolled anyway, like the crc32() function since it won't increase the size very much but will yield a significant speed increase. So maybe the right way is to identify some loops and enable unrolling on them. yes (e.g. in fs/jffs2/Makefile) Where in the PPC FAQ did you read that -Os for 8xx processors? http://penguinppc.org/embedded/howto/x1273.html#AEN1346 Joakim Joakim, Enable loop unrolling by default for 8xx processors is not a good idea because of the limited instructions cache size. I think that's also what is recommended in the ppc faq: enable size optimization (-Os) for 8xx processors gives better performance. For a 750 or so, it would be good to enable loop unrolling. Jaap-Jan Hi I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. Now I wonder: Is this a gcc 2.95.3, PPC or Monta Vista limitation? Which compiler will do unrolling 'where appropriate' for 8xx PPC and Where can I get a precompiled version? The short term solution is to specify -funroll-loops for individual files/directories. Obviously JFFS2 should be included, but what else? ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
Jaap-Jan, Yes, it's not a good ide to enable -funroll-loops for the whole kernel, but some functions should be unrolled anyway, like the crc32() function since it won't increase the size very much but will yield a significant speed increase. So maybe the right way is to identify some loops and enable unrolling on them. yes (e.g. in fs/jffs2/Makefile) Exactly, that's what I wrote earlier. But what else? Surely there are other places where this would be a win? Where in the PPC FAQ did you read that -Os for 8xx processors? http://penguinppc.org/embedded/howto/x1273.html#AEN1346 Thanks, I give it a go. Joakim Joakim Joakim, Enable loop unrolling by default for 8xx processors is not a good idea because of the limited instructions cache size. I think that's also what is recommended in the ppc faq: enable size optimization (-Os) for 8xx processors gives better performance. For a 750 or so, it would be good to enable loop unrolling. Jaap-Jan Hi I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. Now I wonder: Is this a gcc 2.95.3, PPC or Monta Vista limitation? Which compiler will do unrolling 'where appropriate' for 8xx PPC and Where can I get a precompiled version? The short term solution is to specify -funroll-loops for individual files/directories. Obviously JFFS2 should be included, but what else? ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote: I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will unroll some loops with only -O2 since the info page on gcc-3.2 and gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O levels. So I suspect someone decided that small loops can safely be unrolled on i386 at some optimization level, but that same decision (with possibly good reason) was not made for PPC32. So it's a gcc feature, not a MVista-specific issue. -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ [ disclaimer: I work for MVista. ] ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote: I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will unroll some loops with only -O2 since the info page on gcc-3.2 and gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O levels. So I suspect someone decided that small loops can safely be unrolled on i386 at some optimization level, but that same decision (with possibly good reason) was not made for PPC32. So it's a gcc feature, not a MVista-specific issue. Newer gcc(=3.0) may do the same for PPC32. We only know that newer gcc's(Alan Cox knows more) will do it for x86 and 2.95.3 for ppc_8xx won't, so there is a big ? in the middle. Now to the trick question(s): Where might it be suitable to add -funroll-loops or, better yet, can it be done with a pragma or attribute attached to the function in question? It's pretty hard to unroll inline functions otherwise (and only the inline function). -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ [ disclaimer: I work for MVista. ] Yeah, I know :-) Jocke PS. Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent earlier? ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
On Tue, Nov 12, 2002 at 05:09:11PM +0100, Joakim Tjernlund wrote: On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote: I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will unroll some loops with only -O2 since the info page on gcc-3.2 and gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O levels. So I suspect someone decided that small loops can safely be unrolled on i386 at some optimization level, but that same decision (with possibly good reason) was not made for PPC32. So it's a gcc feature, not a MVista-specific issue. Newer gcc(=3.0) may do the same for PPC32. We only know that newer gcc's(Alan Cox knows more) will do it for x86 and 2.95.3 for ppc_8xx won't, so there is a big ? in the middle. Did Alan say what version of gcc Alan was talking about? Now to the trick question(s): Where might it be suitable to add -funroll-loops or, better yet, can it be done with a pragma or attribute attached to the function in question? It's pretty hard to unroll inline functions otherwise (and only the inline function). Well, to lib/Makefile: ifeq ($(CONFIG_PPC32),y) CFLAGS_crc32.o += -funroll-loops endif Should work. And it's not unheard of. Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent earlier? As I said privatly, Dan Malek is handling the enet patch, and I'm looking for time to do the i2c one. Right now I'm working on making the kernel easier to tweak (in some ways) for 2.5. -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
On Tue, Nov 12, 2002 at 05:09:11PM +0100, Joakim Tjernlund wrote: On Tue, Nov 12, 2002 at 10:40:41AM +0100, Joakim Tjernlund wrote: I optimized the crc32() in JFFS2(fs/jffs2/crc.h) by manually unrolling the crc32 loop. This gave me a speed increase of 22% in mounting JFFS2 FS Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Removed my hand coded unrolling and added -funroll-loops to the JFFS2 Makefile, I got similar results as my hand coded unrolling (a little better). I therefore conclude that ppc_8xx-gcc 2.95.3 from Monta Vista does not do ANY unrolling unless you specify -funroll-loops. Doing this for the whole kernel is NOT a good idea, it will run slower due to big increase of size. I'm sort-of supprised that gcc-2.95.x (or gcc-*, for that matter) will unroll some loops with only -O2 since the info page on gcc-3.2 and gcc-2.95 both say that -funroll-loops isn't turned on my any of the -O levels. So I suspect someone decided that small loops can safely be unrolled on i386 at some optimization level, but that same decision (with possibly good reason) was not made for PPC32. So it's a gcc feature, not a MVista-specific issue. Newer gcc(=3.0) may do the same for PPC32. We only know that newer gcc's(Alan Cox knows more) will do it for x86 and 2.95.3 for ppc_8xx won't, so there is a big ? in the middle. Did Alan say what version of gcc Alan was talking about? No, I did not ask at the time :-( Now to the trick question(s): Where might it be suitable to add -funroll-loops or, better yet, can it be done with a pragma or attribute attached to the function in question? It's pretty hard to unroll inline functions otherwise (and only the inline function). Well, to lib/Makefile: ifeq ($(CONFIG_PPC32),y) CFLAGS_crc32.o += -funroll-loops endif Should work. And it's not unheard of. Yes, that much I already figured, but are there OTHER places in the kernel that also might benefit from unrolling. I don't know the kernel as well as you do and was hoping for a lead or two. Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent earlier? As I said privatly, Dan Malek is handling the enet patch, and I'm looking for time to do the i2c one. Right now I'm working on making the kernel easier to tweak (in some ways) for 2.5. I know Dan is handling the enet stuff, but since you both work for MV(don't you?) I figured you might know, being an insider and all :-) Maybe your tweak stuff could make use of forced unrolling? Jocke ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
From one of our compiler guys in relationship to your original question: Later Alan Cox pointed out that my changes makes x86 run slower and it turns out that on x86 and a fairly new gcc will automatically unroll loops 'where appropriate' Fairly new: 3.3-pre only. Now I wonder: Is this a gcc 2.95.3, PPC or Monta Vista limitation? 2.95.3. Which compiler will do unrolling 'where appropriate' for 8xx PPC and Where can I get a precompiled version? Nowhere, nor is using it for your kernels necessarily a good idea if you don't have compiler experience. I know Dan is handling the enet stuff, but since you both work Dan no longer works for MontaVista. (And has not for some time now.) for MV(don't you?) I figured you might know, being an insider and all :-) Maybe your tweak stuff could make use of forced unrolling? Jocke ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
ppc_8xx-gcc 2.95.3 Monta Vista does not do ANY loop unrolling
On Tue, Nov 12, 2002 at 05:46:49PM +0100, Joakim Tjernlund wrote: Now to the trick question(s): Where might it be suitable to add -funroll-loops or, better yet, can it be done with a pragma or attribute attached to the function in question? It's pretty hard to unroll inline functions otherwise (and only the inline function). Well, to lib/Makefile: ifeq ($(CONFIG_PPC32),y) CFLAGS_crc32.o += -funroll-loops endif Should work. And it's not unheard of. Yes, that much I already figured, but are there OTHER places in the kernel that also might benefit from unrolling. I don't know the kernel as well as you do and was hoping for a lead or two. Not really. Unfortunatly what might be better is to figure out how it works on i386, and then figure out how to duplicate that logic on PPC, maybe making it another flag and then turned on for different sizes based on -mtune. well, according to Mark Hatle it's only gcc 3.3-pre that does this kind of unrolling automatically so that will take a very long time before it's figured out and usable. So maybe it's worth the effort to figure out a few hot spots and apply -funroll-loops there. -mtune is new to me, need to look that one up. Any progress on the i2c-algo-8xx.c and/or 8xx_io/enet.c patches I sent earlier? As I said privatly, Dan Malek is handling the enet patch, and I'm looking for time to do the i2c one. Right now I'm working on making the kernel easier to tweak (in some ways) for 2.5. I know Dan is handling the enet stuff, but since you both work for MV(don't you?) I figured you might know, being an insider and all :-) I don't follow you. Dan doesn't work for MVista now. Sorry, I didn't know that. Maybe your tweak stuff could make use of forced unrolling? Eventually, it could be used for turning it on or off for the whole kernel, or for a specific area even, once I get Makefile tweaks working. First I'm trying to get dependancies right. makes sense. Jocke ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/