[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 dean at arctic dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||WORKSFORME --- Comment #14 from dean at arctic dot org 2013-04-21 20:36:55 UTC --- i dug out the old c code for my original bug report -- it's fine with a 4.7.x prerelease. i didn't bother narrowing down to where the spills went away. -dean
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 --- Comment #13 from Eric Botcazou 2013-04-21 09:59:26 UTC --- > In this case the code is computationally intensive. It doesn't make sense to > compile with '-Os' for cryptographic algorithms. Huh? Of course it makes sense to compile with -Os if you have specific code size constraints and it's quite easy to have code compiled at -O3 running slower than compiled at -O2/-Os on (very) embedded CPUs.
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 Bill Pringlemeir changed: What|Removed |Added CC||bpringlemeir at gmail dot ||com --- Comment #12 from Bill Pringlemeir 2013-04-20 15:47:36 UTC --- (In reply to comment #11) > Note that using -O3 for embedded targets isn't recommended; use -Os instead. In this case the code is computationally intensive. It doesn't make sense to compile with '-Os' for cryptographic algorithms. However, I think that a performance increase can be achieved by working with gcc. I have worked on an ARM project where two different developers choose 'TomsFastMath' and 'libgcrypt' as a crypto-base. It seems that 'libgcrypt' was performing better on the ARM. I believe this is because it used 'gcc' inline assembler to map op-codes not available in 'C'. Gcc's inline assembler is very nice as you don't have to do register allocation and all the other nice things that 'gcc' does for us. http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=mpi/longlong.h;hb=HEAD The use of the carry bit for multi-precision arithmetic gives a large advantage for algorithms such as RSA cites as being worse with ARMcc versus 'gcc' on the ARM. For the original issue which the bug was filed (x86 sha), I can understand your frustration. I also tried to expand the SHA to handle 64 bits at a time as you have done with MMX ('__builtin_ia32_pslld', etc). It is difficult to get this to work with 'gcc'; I only had a 30% speed up versus 32bit versions.
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 Eric Botcazou changed: What|Removed |Added CC||ebotcazou at gcc dot ||gnu.org --- Comment #11 from Eric Botcazou 2011-11-16 08:13:48 UTC --- > This is not a new bug. This is not a "misfeature." This is actually > something > worth working on. This was filed in 2004 and hasn't been addressed since ... > What is the hold up? If GCC is to be used in embedded platforms it can't go > around taking 150% of the stack space as it's competitors... GCC is a volunteer project. If you think that it can be improved, you're welcome to implement enhancements or hire/sponsor someone to do the work for you. Note that using -O3 for embedded targets isn't recommended; use -Os instead.
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 --- Comment #10 from Tom St Denis 2011-11-15 14:20:07 UTC --- Another update ... We've just profiled our crypto library and across the board [cipher, hashes, PK functions like RSA/ECC] GCC is a complete loser against ARMcc [r713]. And it's not that GCC is faster and that's at least a price worth paying... In most cases ARMcc and GCC are dead even [arm faster for some things, slower for others]. This is with gcc 4.4.5 and 4.5.1 on an ARM. This is not a new bug. This is not a "misfeature." This is actually something worth working on. This was filed in 2004 and hasn't been addressed since ... What is the hold up? If GCC is to be used in embedded platforms it can't go around taking 150% of the stack space as it's competitors...
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 Tom St Denis changed: What|Removed |Added CC||tstdenis at elliptictech ||dot com --- Comment #8 from Tom St Denis 2011-11-10 19:27:23 UTC --- (In reply to comment #7) > (In reply to comment #6) > > Created attachment 25751 [details] > > Another test case > > > > Another example using > > > > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) > > > > The function when compiled with "-m32 -O3" uses way more stack than it > > should. > > It's like it's putting the fp_int.dp[] array on the stack... > > > > I can confirm this is a problem on 32/64 and ARM as well. > > That is a different issue dealing with memcpy works (or does not get > optimized). File a different bug. How the hell am I supposed to know that? Maybe the GCC team should clean up their stack spills once and for all. I'm sure $OBSCURE_PLATFORM_NAME or $WONDEROUS_NEW_TREE_REPRESENTATION can wait. I actually have a different routine [fp_mul_comba_small_set.c] from TomsFastMath that does use memcpy, has the same style of unrolled multipliers, and does not have this problem. I can file a new bug if you want, but as a user of GCC I'm not meant to understand the ins-and-outs of the depths of the compiler. I actually did search for stack-waste instead of just blindly filing a new report... /rant
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 --- Comment #9 from Tom St Denis 2011-11-10 19:28:33 UTC --- (In reply to comment #7) > (In reply to comment #6) > > Created attachment 25751 [details] > > Another test case > > > > Another example using > > > > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) > > > > The function when compiled with "-m32 -O3" uses way more stack than it > > should. > > It's like it's putting the fp_int.dp[] array on the stack... > > > > I can confirm this is a problem on 32/64 and ARM as well. > > That is a different issue dealing with memcpy works (or does not get > optimized). File a different bug. How the hell am I supposed to know that? Maybe the GCC team should clean up their stack spills once and for all. I'm sure $OBSCURE_PLATFORM_NAME or $WONDEROUS_NEW_TREE_REPRESENTATION can wait. I actually have a different routine [fp_mul_comba_small_set.c] from TomsFastMath that does use memcpy, has the same style of unrolled multipliers, and does not have this problem. I can file a new bug if you want, but as a user of GCC I'm not meant to understand the ins-and-outs of the depths of the compiler. I actually did search for stack-waste instead of just blindly filing a new report... /rant
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 --- Comment #7 from Andrew Pinski 2011-11-08 20:24:01 UTC --- (In reply to comment #6) > Created attachment 25751 [details] > Another test case > > Another example using > > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) > > The function when compiled with "-m32 -O3" uses way more stack than it > should. > It's like it's putting the fp_int.dp[] array on the stack... > > I can confirm this is a problem on 32/64 and ARM as well. That is a different issue dealing with memcpy works (or does not get optimized). File a different bug.
[Bug rtl-optimization/17838] spills are not re-used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838 --- Comment #6 from Tom St Denis 2011-11-08 14:17:55 UTC --- Created attachment 25751 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25751 Another test case Another example using gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) The function when compiled with "-m32 -O3" uses way more stack than it should. It's like it's putting the fp_int.dp[] array on the stack... I can confirm this is a problem on 32/64 and ARM as well.
[Bug rtl-optimization/17838] spills are not re-used
--- Comment #5 from pinskia at gcc dot gnu dot org 2009-04-22 21:17 --- I think this was fixed for GCC 4.4.0 with the IRA but I can't test right now since the preprocessed source uses builtin functions which are no longer exist in 4.4. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Keywords||ra http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838