[Bug rtl-optimization/17838] spills are not re-used

2013-04-21 Thread dean at arctic dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838



dean at arctic dot org changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||WORKSFORME



--- Comment #14 from dean at arctic dot org 2013-04-21 20:36:55 UTC ---

i dug out the old c code for my original bug report -- it's fine with a 4.7.x

prerelease.  i didn't bother narrowing down to where the spills went away.



-dean


[Bug rtl-optimization/17838] spills are not re-used

2013-04-21 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838



--- Comment #13 from Eric Botcazou  2013-04-21 
09:59:26 UTC ---

> In this case the code is computationally intensive.  It doesn't make sense to

> compile with '-Os' for cryptographic algorithms.



Huh?  Of course it makes sense to compile with -Os if you have specific code

size constraints and it's quite easy to have code compiled at -O3 running

slower than compiled at -O2/-Os on (very) embedded CPUs.


[Bug rtl-optimization/17838] spills are not re-used

2013-04-20 Thread bpringlemeir at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838



Bill Pringlemeir  changed:



   What|Removed |Added



 CC||bpringlemeir at gmail dot

   ||com



--- Comment #12 from Bill Pringlemeir  
2013-04-20 15:47:36 UTC ---

(In reply to comment #11)



> Note that using -O3 for embedded targets isn't recommended; use -Os instead.



In this case the code is computationally intensive.  It doesn't make sense to

compile with '-Os' for cryptographic algorithms.



However, I think that a performance increase can be achieved by working with

gcc.  I have worked on an ARM project where two different developers choose

'TomsFastMath' and 'libgcrypt' as a crypto-base.  It seems that 'libgcrypt' was

performing better on the ARM.  I believe this is because it used 'gcc' inline

assembler to map op-codes not available in 'C'.  Gcc's inline assembler is very

nice as you don't have to do register allocation and all the other nice things

that 'gcc' does for us.



http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=mpi/longlong.h;hb=HEAD



The use of the carry bit for multi-precision arithmetic gives a large advantage

for algorithms such as RSA cites as being worse with ARMcc versus 'gcc' on the

ARM.



For the original issue which the bug was filed (x86 sha), I can understand your

frustration.  I also tried to expand the SHA to handle 64 bits at a time as you

have done with MMX ('__builtin_ia32_pslld', etc).  It is difficult to get this

to work with 'gcc'; I only had a 30% speed up versus 32bit versions.


[Bug rtl-optimization/17838] spills are not re-used

2011-11-16 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

Eric Botcazou  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot
   ||gnu.org

--- Comment #11 from Eric Botcazou  2011-11-16 
08:13:48 UTC ---
> This is not a new bug.  This is not a "misfeature."  This is actually 
> something
> worth working on.  This was filed in 2004 and hasn't been addressed since ...
> What is the hold up?  If GCC is to be used in embedded platforms it can't go
> around taking 150% of the stack space as it's competitors...

GCC is a volunteer project.  If you think that it can be improved, you're
welcome to implement enhancements or hire/sponsor someone to do the work for
you.

Note that using -O3 for embedded targets isn't recommended; use -Os instead.


[Bug rtl-optimization/17838] spills are not re-used

2011-11-15 Thread tstdenis at elliptictech dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #10 from Tom St Denis  2011-11-15 
14:20:07 UTC ---
Another update ... We've just profiled our crypto library and across the board
[cipher, hashes, PK functions like RSA/ECC] GCC is a complete loser against
ARMcc [r713].  And it's not that GCC is faster and that's at least a price
worth paying... In most cases ARMcc and GCC are dead even [arm faster for some
things, slower for others].

This is with gcc 4.4.5 and 4.5.1 on an ARM.

This is not a new bug.  This is not a "misfeature."  This is actually something
worth working on.  This was filed in 2004 and hasn't been addressed since ...
What is the hold up?  If GCC is to be used in embedded platforms it can't go
around taking 150% of the stack space as it's competitors...


[Bug rtl-optimization/17838] spills are not re-used

2011-11-10 Thread tstdenis at elliptictech dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

Tom St Denis  changed:

   What|Removed |Added

 CC||tstdenis at elliptictech
   ||dot com

--- Comment #8 from Tom St Denis  2011-11-10 
19:27:23 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > Created attachment 25751 [details]
> > Another test case
> > 
> > Another example using 
> > 
> > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 
> > 
> > The function when compiled with "-m32 -O3" uses way more stack than it 
> > should. 
> > It's like it's putting the fp_int.dp[] array on the stack...
> > 
> > I can confirm this is a problem on 32/64 and ARM as well.
> 
> That is a different issue dealing with memcpy works (or does not get
> optimized).  File a different bug.

How the hell am I supposed to know that?  Maybe the GCC team should clean up
their stack spills once and for all.  I'm sure $OBSCURE_PLATFORM_NAME or
$WONDEROUS_NEW_TREE_REPRESENTATION can wait.

I actually have a different routine [fp_mul_comba_small_set.c] from
TomsFastMath that does use memcpy, has the same style of unrolled multipliers,
and does not have this problem.

I can file a new bug if you want, but as a user of GCC I'm not meant to
understand the ins-and-outs of the depths of the compiler.  I actually did
search for stack-waste instead of just blindly filing a new report...

/rant


[Bug rtl-optimization/17838] spills are not re-used

2011-11-10 Thread tstdenis at elliptictech dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #9 from Tom St Denis  2011-11-10 
19:28:33 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > Created attachment 25751 [details]
> > Another test case
> > 
> > Another example using 
> > 
> > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 
> > 
> > The function when compiled with "-m32 -O3" uses way more stack than it 
> > should. 
> > It's like it's putting the fp_int.dp[] array on the stack...
> > 
> > I can confirm this is a problem on 32/64 and ARM as well.
> 
> That is a different issue dealing with memcpy works (or does not get
> optimized).  File a different bug.

How the hell am I supposed to know that?  Maybe the GCC team should clean up
their stack spills once and for all.  I'm sure $OBSCURE_PLATFORM_NAME or
$WONDEROUS_NEW_TREE_REPRESENTATION can wait.

I actually have a different routine [fp_mul_comba_small_set.c] from
TomsFastMath that does use memcpy, has the same style of unrolled multipliers,
and does not have this problem.

I can file a new bug if you want, but as a user of GCC I'm not meant to
understand the ins-and-outs of the depths of the compiler.  I actually did
search for stack-waste instead of just blindly filing a new report...

/rant


[Bug rtl-optimization/17838] spills are not re-used

2011-11-08 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #7 from Andrew Pinski  2011-11-08 
20:24:01 UTC ---
(In reply to comment #6)
> Created attachment 25751 [details]
> Another test case
> 
> Another example using 
> 
> gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 
> 
> The function when compiled with "-m32 -O3" uses way more stack than it 
> should. 
> It's like it's putting the fp_int.dp[] array on the stack...
> 
> I can confirm this is a problem on 32/64 and ARM as well.

That is a different issue dealing with memcpy works (or does not get
optimized).  File a different bug.


[Bug rtl-optimization/17838] spills are not re-used

2011-11-08 Thread tstdenis at elliptictech dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #6 from Tom St Denis  2011-11-08 
14:17:55 UTC ---
Created attachment 25751
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25751
Another test case

Another example using 

gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 

The function when compiled with "-m32 -O3" uses way more stack than it should. 
It's like it's putting the fp_int.dp[] array on the stack...

I can confirm this is a problem on 32/64 and ARM as well.


[Bug rtl-optimization/17838] spills are not re-used

2009-04-22 Thread pinskia at gcc dot gnu dot org


--- Comment #5 from pinskia at gcc dot gnu dot org  2009-04-22 21:17 ---
I think this was fixed for GCC 4.4.0 with the IRA but I can't test right now
since the preprocessed source uses builtin functions which are no longer exist
in 4.4.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

   Keywords||ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838