[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2006-04-18 Thread bonzini at gcc dot gnu dot org


--- Comment #26 from bonzini at gnu dot org  2006-04-18 08:23 ---
Subject: Bug 19653

Author: bonzini
Date: Tue Apr 18 08:23:39 2006
New Revision: 113026

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=113026
Log:
2006-04-18  Paolo Bonzini  [EMAIL PROTECTED]

PR target/27117

Partial revert of revision 112637
2006-04-03  Paolo Bonzini  [EMAIL PROTECTED]
Dale Johannesen  [EMAIL PROTECTED]

PR target/19653
* regclass.c (struct reg_pref): Update documentation.
(regclass): Set prefclass to NO_REGS if memory is the best option.
(record_reg_classes): Cope with a prefclass set to NO_REGS.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/regclass.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653



[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2006-04-03 Thread bonzini at gcc dot gnu dot org


--- Comment #24 from bonzini at gnu dot org  2006-04-03 11:20 ---
Subject: Bug 19653

Author: bonzini
Date: Mon Apr  3 11:20:07 2006
New Revision: 112637

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=112637
Log:
2005-08-08  Paolo Bonzini  [EMAIL PROTECTED]
Dale Johannesen  [EMAIL PROTECTED]

PR target/19653
* regclass.c (struct reg_pref): Update documentation.
(regclass): Set prefclass to NO_REGS if memory is the best option.
(record_reg_classes): Cope with a prefclass set to NO_REGS.
* reload.c (find_reloads): Take PREFERRED_OUTPUT_RELOAD_CLASS
into account.  For non-registers, equate an empty preferred
reload class to a `!' in the constraint; move the if clause to
do so after those that reject the insn.
(push_reload): Allow PREFERRED_*_RELOAD_CLASS to liberally
return NO_REGS.
(find_dummy_reload): Likewise.
* doc/tm.texi (Register Classes): Document what it means
if PREFERRED_*_RELOAD_CLASS return NO_REGS.
* config/i386/i386.c (ix86_preferred_reload_class): Force
using SSE registers (and return NO_REGS for floating-point
constants) if math is done with SSE.
(ix86_preferred_output_reload_class): New.
* config/i386/i386-protos.h (ix86_preferred_output_reload_class): New.
* config/i386/i386.h (PREFERRED_OUTPUT_RELOAD_CLASS): New.
* config/i386/i386.md: Remove # register preferences.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386-protos.h
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.h
trunk/gcc/config/i386/i386.md
trunk/gcc/doc/tm.texi
trunk/gcc/regclass.c
trunk/gcc/reload.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653



[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2006-04-03 Thread bonzini at gnu dot org


--- Comment #25 from bonzini at gnu dot org  2006-04-03 11:20 ---
fixed on mainline.


-- 

bonzini at gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653



[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-11-22 Thread bonzini at gcc dot gnu dot org


--- Comment #23 from bonzini at gcc dot gnu dot org  2005-11-22 09:21 
---
Dale, can you please take care of merging this into 4.2?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653



[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-09-21 Thread paolo dot bonzini at lu dot unisi dot ch

--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch  
2005-09-21 06:51 ---
Subject: Re:  x87 reg allocated for constants for -mfpmath=sse


Note that in this pattern cost computation of MMX_REGS  are all ignored ('*' 
in front of y). So, the cost 
which is computed is for 'r' which is GENERAL_REGS. This cost is too high and 
eventually results in 
memory cost to be lower than register cost. I tried the following simple patch 
as experiment and got all 
the performance back (it is now comparable with 4.0). Note that in this patch, 
I removed the '*' in the 
2nd alternative so cost of keeping the operand in mmx_regs class is factored 
in. This resulted in a 
lower cost than that of memory. Is this the way to go? This is just an 
experiment which seems to work. 
  

I think it makes sense.  The x86 back-end is playing too many tricks 
(such as the # classes) with the register allocator and regclass 
especially, and they are biting back.

Still, I'd rather hear from an expert as to why the classes were written 
like this.

Paolo


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-09-21 Thread dalej at gcc dot gnu dot org

--- Additional Comments From dalej at gcc dot gnu dot org  2005-09-21 17:23 
---
I agree with Paolo.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-08-22 Thread bonzini at gcc dot gnu dot org

--- Additional Comments From bonzini at gcc dot gnu dot org  2005-08-22 
22:03 ---
SPEC results for i686-pc-linux-gnu follow.  The only significant regression is
in galgel, overall it's about 1% better for SPECint and 2% better for SPECfp.

Note that crafty improves a lot because of Dale's patch.

   164.gzip  1400 151 925* 1400 152 921*
   175.vpr   1400 163 859* 1400 162 862*
   176.gcc   1100  70.9  1552* 1100  71.1  1548*
   181.mcf   1800 1761021* 1800 1761020*
   186.crafty1000 107 938* 1000 102 984*
   197.parser1800 205 877* 1800 205 876*
   252.eon   1300 149 873* 1300 141 919*
   253.perlbmk   1800 1261434* 1800 1231464*
   254.gap   1100  82.1  1340* 1100  80.8  1361*
   255.vortex1900 1341415* 1900 1341413*
   256.bzip2 1500 161 930* 1500 161 933*
   300.twolf 3000 2351276* 3000 2341281*
   SPECint_base2000  1093
   SPECint2000 1106

   168.wupwise   1600 174 920* 1600 173 926*
   171.swim  3100 1801721* 3100 1811713*
   172.mgrid 1800 257 700* 1800 257 701*
   173.applu 2100 2001049* 2100 1991056*
   177.mesa  1400 1251116* 1400 1231138*
   178.galgel2900 305 952* 2900 312 930*
   179.art   2600 2161206* 2600 2121229*
   183.equake1300  80.5  1615* 1300  76.1  1708*
   187.facerec   1900 286 664* 1900 286 664*
   188.ammp  2200 305 721* 2200 298 739*
   189.lucas 2000 225 889* 2000 203 983*
   191.fma3d 2100 397 530* 2100 373 563*
   200.sixtrack  1100 188 587* 1100 188 586*
   301.apsi  2600 262 991* 2600 261 996*
   SPECfp_base2000921
   SPECfp2000   939

NOTES
-
 Base flags: -O2 -msse -msse2 -mfpmath=sse
 Peak flags: -O2 -msse -msse2 -mfpmath=sse


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-08-02 Thread dalej at gcc dot gnu dot org

--- Additional Comments From dalej at gcc dot gnu dot org  2005-08-02 22:57 
---
Preceding patch fixes the ICE I was getting.  The tests following the modified 
area in find_reloads were 
being skipped in cases where they weren't before (in particular, when output 
reloads are not allowed,
this was not detected).  The revised patch moves the area Paolo modified below 
those tests.  Going 
through more testing now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-08-01 Thread dalej at gcc dot gnu dot org

--- Additional Comments From dalej at gcc dot gnu dot org  2005-08-01 20:56 
---
Unfortunately the latest version of this patch causes a bootstrap failure on 
ppc:

../../gcc3.apple.200502/gcc/reload.c: In function 'find_reloads':
../../gcc3.apple.200502/gcc/reload.c:4512: internal compiler error: in 
do_output_
reload, at reload1.c:6936

which is

  /* If is a JUMP_INSN, we can't support output reloads yet.  */
  gcc_assert (!JUMP_P (insn));

Digging further.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-07-27 Thread bonzini at gcc dot gnu dot org


-- 
   What|Removed |Added

 CC||dalej at apple dot com
 AssignedTo|unassigned at gcc dot gnu   |bonzini at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2005-06-19 14:58:01 |2005-07-27 15:53:05
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-07-14 Thread bonzini at gcc dot gnu dot org

--- Additional Comments From bonzini at gcc dot gnu dot org  2005-07-14 
08:45 ---
 But I don't see immediately how reload could be convinced to do so  
 automatically, as the choice of the reload class for one insn is independend  
 from the choices of reloads for the same reg but in other insns.  

We can use PREFERRED_RELOAD_CLASS and PREFERRED_OUTPUT_RELOAD_CLASS.  I am not
sure if the fix is over-eager, but it works great on libsse2.  Every fld that
was there disappears, and from a cursory check, in current mainline's code all
of them could have been inherited.

Patches are at:
- http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00914.html (regclass)
- http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00983.html (reload+i386)

It is also necessary to remove from the MD the `#' hints for regclass, with

  sed -i 's/#[^,\][^,]*\([,]\)/\1/g' i386.md

Paolo

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-07-13 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2005-07-13 
11:30 ---
*** Bug 22453 has been marked as a duplicate of this bug. ***

-- 
   What|Removed |Added

 CC||bonzini at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-07-13 Thread bonzini at gcc dot gnu dot org

--- Additional Comments From bonzini at gcc dot gnu dot org  2005-07-13 
11:51 ---
Smaller testcase from PR22453:


extern double f(double, double);
void g (double x)
{
  double z, y;

  z = 0.0;
  y = 1.0 - x;

again:
  z = y - z;
  f(z, 1.0);
  if (z == 0.0)
goto again;
}

has a fld1 instruction when compiled with -mfpmath=sse -msse2 -msseregparm
-mtune=pentiumpro -O2.

This instruction is caused by a reload into a FLOAT_REGS register, and moving
the value to a SSE register needs secondary memory.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-07-13 Thread matz at suse dot de

--- Additional Comments From matz at suse dot de  2005-07-13 13:55 ---
I was going to add this text to PR22453, when I noticed that it was closed 
as duplicate to this one.  So putting it here for reference, although 
everything seems to be analyzed already: 
 
The reload happens, because reg 58 gets no hardreg, because it's live over  
a call, and it's not worthwhile to put it into a call clobbered reg (which  
SSE regs are).  So reg 58 is placed onto stack (at ebp+16).  Now this mem must  
be initialized with 1.0.  If that is done via x87 (fld1 , fst ebp+16), via  
GENERAL_REGS (mov 1.0 - (reg:DF ax) , mov (reg:DF ax) - (ebp+16)), or via  
SSE_REGS (movsd (mem 1.0) - xmm0 , mov xmm0 - (ebp+16)) is actually not  
that important.  You won't get rid of this reload.  
  
Except that _if_ you force it to use SSE_REGS, then the next reload from  
(ebp+16) for the next insn can be inherited (as it's then the same mode),  
hence the initial store to ebp+16 is useless and will be removed.  
  
This can be tested with this hack:  
--- i386.md 12 Jul 2005 09:20:12 -  1.645  
+++ i386.md 13 Jul 2005 13:47:06 -  
@@ -2417,9 +2417,9 @@  
  
 (define_insn *movdf_nointeger  
   [(set (match_operand:DF 0 nonimmediate_operand  
-   =f#Y,m  ,f#Y,*r  ,o  ,Y*x#f,Y*x#f,Y*x#f  ,m)  
+   =?f#Y,m  ,f#Y,*?r  ,o  ,Y*x#f,Y*x#f,Y*x#f  ,m)  
(match_operand:DF 1 general_operand  
-   fm#Y,f#Y,G  ,*roF,F*r,C,Y*x#f,HmY*x#f,Y*x#f))]  
+   ?fm#Y,f#Y,G  ,*?roF,F*r,C,Y*x#f,HmY*x#f,Y*x#f))]  
   (GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM)  
 ((optimize_size || !TARGET_INTEGER_DFMODE_MOVES)  !TARGET_64BIT)  
 (reload_in_progress || reload_completed  
  
But I don't see immediately how reload could be convinced to do so  
automatically, as the choice of the reload class for one insn is independend  
from the choices of reloads for the same reg but in other insns.  

-- 
   What|Removed |Added

 CC||matz at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-07-13 Thread bonzini at gcc dot gnu dot org

--- Additional Comments From bonzini at gcc dot gnu dot org  2005-07-13 
14:08 ---
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01783.html crashes on richard
guenther's libsse2, fwiw.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-07-13 Thread bonzini at gcc dot gnu dot org

--- Additional Comments From bonzini at gcc dot gnu dot org  2005-07-13 
14:23 ---
Michael, thank you very much.  Your analysis will probably help a lot.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-06-19 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2005-06-19 
14:58 ---
Confirmed.

-- 
   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed||1
  GCC build triplet|i686-pc-linux-gnu   |
   GCC host triplet|i686-pc-linux-gnu   |
   Last reconfirmed|-00-00 00:00:00 |2005-06-19 14:58:01
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-02-08 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-02-08 
10:13 ---
rth hacked the constraints recently to have better ra for some fp cases.  Can
you see if the bug is still there today on mainline?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-02-08 Thread uros at kss-loka dot si

--- Additional Comments From uros at kss-loka dot si  2005-02-09 07:30 
---
(In reply to comment #4)
 rth hacked the constraints recently to have better ra for some fp cases.  Can
 you see if the bug is still there today on mainline?

gcc version 4.0.0 20050209 (experimental)
'-O2 -march=pentium4 -mfpmath=sse -ffast-math -D__NO_MATH_INLINES'

...
comisd  .LC2, %xmm0
jb  .L2
fld1- load of 1 into SSE reg
fstpl   -112(%ebp)
movsd   -112(%ebp), %xmm2
fldz- load of 0 into SSE reg
fstl-112(%ebp)
movsd   -112(%ebp), %xmm1
movapd  %xmm1, %xmm0
.L4:
fldl32(%ebx)   movsd 32(%ebx), %xmm3
fstpl   -96(%ebp)  movsd %xmm3, -96(%ebp)
movsd   -96(%ebp), %xmm3   [ -not needed- ]
mulsd   %xmm2, %xmm3
subsd   %xmm0, %xmm3
fldl24(%ebx)-move to temporary, this is OK
fstpl   -88(%ebp)
mulsd   -88(%ebp), %xmm2
xorpd   .LC4, %xmm2
movsd   -88(%ebp), %xmm4
mulsd   %xmm1, %xmm4
...

The bug is still there (it manifests itself also in a couple of other places).
Could somebody confirm this bug? A testcase is attached.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-01-27 Thread uros at kss-loka dot si

--- Additional Comments From uros at kss-loka dot si  2005-01-27 08:35 
---
Created an attachment (id=8080)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8080action=view)
Self-contained example

Self-contained example

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-01-27 Thread uros at kss-loka dot si

--- Additional Comments From uros at kss-loka dot si  2005-01-27 09:14 
---
A patch (RFC actually) that was used to teach allocator which register set to 
use:
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01783.html
[This patch probably doesn't work well with xmmintrin.h stuff as pointed by rth
in follow-up comment.]

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-01-27 Thread pinskia at gcc dot gnu dot org


-- 
   What|Removed |Added

   Severity|normal  |minor
   Keywords||missed-optimization, ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653


[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse

2005-01-27 Thread uros at kss-loka dot si

--- Additional Comments From uros at kss-loka dot si  2005-01-28 06:23 
---
BTW: I don't think that x87 should be fully disabled for -mfpmath=sse. st(0) can
be used as a temporary storage for memory-to-memory transfers. Also, it can do
on-the-fly FP extending and truncating, without touching a SSE reg:

movsd (%eax),xmm1  # ~7 cycles
cvtsd2ss xmm1,(%esp)   # 14 cycles


could be implemented by:

fldl (%eax)   # ~7cycles
fstps (%esp) # ~7cycles


There is nothing wrong, if fldl is replaced with fldz or fld1. The performance
problems will arise in case when memory location is used in subsequent SSE
computations. In this case, it would be better if zero is generated in SSE
register and stored to memory from SSE reg.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653