[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Comment #26 from bonzini at gnu dot org 2006-04-18 08:23 --- Subject: Bug 19653 Author: bonzini Date: Tue Apr 18 08:23:39 2006 New Revision: 113026 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=113026 Log: 2006-04-18 Paolo Bonzini <[EMAIL PROTECTED]> PR target/27117 Partial revert of revision 112637 2006-04-03 Paolo Bonzini <[EMAIL PROTECTED]> Dale Johannesen <[EMAIL PROTECTED]> PR target/19653 * regclass.c (struct reg_pref): Update documentation. (regclass): Set prefclass to NO_REGS if memory is the best option. (record_reg_classes): Cope with a prefclass set to NO_REGS. Modified: trunk/gcc/ChangeLog trunk/gcc/regclass.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Comment #25 from bonzini at gnu dot org 2006-04-03 11:20 --- fixed on mainline. -- bonzini at gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Comment #24 from bonzini at gnu dot org 2006-04-03 11:20 --- Subject: Bug 19653 Author: bonzini Date: Mon Apr 3 11:20:07 2006 New Revision: 112637 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=112637 Log: 2005-08-08 Paolo Bonzini <[EMAIL PROTECTED]> Dale Johannesen <[EMAIL PROTECTED]> PR target/19653 * regclass.c (struct reg_pref): Update documentation. (regclass): Set prefclass to NO_REGS if memory is the best option. (record_reg_classes): Cope with a prefclass set to NO_REGS. * reload.c (find_reloads): Take PREFERRED_OUTPUT_RELOAD_CLASS into account. For non-registers, equate an empty preferred reload class to a `!' in the constraint; move the if clause to do so after those that reject the insn. (push_reload): Allow PREFERRED_*_RELOAD_CLASS to liberally return NO_REGS. (find_dummy_reload): Likewise. * doc/tm.texi (Register Classes): Document what it means if PREFERRED_*_RELOAD_CLASS return NO_REGS. * config/i386/i386.c (ix86_preferred_reload_class): Force using SSE registers (and return NO_REGS for floating-point constants) if math is done with SSE. (ix86_preferred_output_reload_class): New. * config/i386/i386-protos.h (ix86_preferred_output_reload_class): New. * config/i386/i386.h (PREFERRED_OUTPUT_RELOAD_CLASS): New. * config/i386/i386.md: Remove # register preferences. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-protos.h trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/config/i386/i386.md trunk/gcc/doc/tm.texi trunk/gcc/regclass.c trunk/gcc/reload.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Comment #23 from bonzini at gcc dot gnu dot org 2005-11-22 09:21 --- Dale, can you please take care of merging this into 4.2? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From dalej at gcc dot gnu dot org 2005-09-21 17:23 --- I agree with Paolo. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-09-21 06:51 --- Subject: Re: x87 reg allocated for constants for -mfpmath=sse >Note that in this pattern cost computation of MMX_REGS are all ignored ('*' >in front of y). So, the cost >which is computed is for 'r' which is GENERAL_REGS. This cost is too high and >eventually results in >memory cost to be lower than register cost. I tried the following simple patch >as experiment and got all >the performance back (it is now comparable with 4.0). Note that in this patch, >I removed the '*' in the >2nd alternative so cost of keeping the operand in mmx_regs class is factored >in. This resulted in a >lower cost than that of memory. Is this the way to go? This is just an >experiment which seems to work. > > I think it makes sense. The x86 back-end is playing too many tricks (such as the # classes) with the register allocator and regclass especially, and they are biting back. Still, I'd rather hear from an expert as to why the classes were written like this. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From bonzini at gcc dot gnu dot org 2005-08-22 22:03 --- SPEC results for i686-pc-linux-gnu follow. The only significant regression is in galgel, overall it's about 1% better for SPECint and 2% better for SPECfp. Note that crafty improves a lot because of Dale's patch. 164.gzip 1400 151 925* 1400 152 921* 175.vpr 1400 163 859* 1400 162 862* 176.gcc 1100 70.9 1552* 1100 71.1 1548* 181.mcf 1800 1761021* 1800 1761020* 186.crafty1000 107 938* 1000 102 984* 197.parser1800 205 877* 1800 205 876* 252.eon 1300 149 873* 1300 141 919* 253.perlbmk 1800 1261434* 1800 1231464* 254.gap 1100 82.1 1340* 1100 80.8 1361* 255.vortex1900 1341415* 1900 1341413* 256.bzip2 1500 161 930* 1500 161 933* 300.twolf 3000 2351276* 3000 2341281* SPECint_base2000 1093 SPECint2000 1106 168.wupwise 1600 174 920* 1600 173 926* 171.swim 3100 1801721* 3100 1811713* 172.mgrid 1800 257 700* 1800 257 701* 173.applu 2100 2001049* 2100 1991056* 177.mesa 1400 1251116* 1400 1231138* 178.galgel2900 305 952* 2900 312 930* 179.art 2600 2161206* 2600 2121229* 183.equake1300 80.5 1615* 1300 76.1 1708* 187.facerec 1900 286 664* 1900 286 664* 188.ammp 2200 305 721* 2200 298 739* 189.lucas 2000 225 889* 2000 203 983* 191.fma3d 2100 397 530* 2100 373 563* 200.sixtrack 1100 188 587* 1100 188 586* 301.apsi 2600 262 991* 2600 261 996* SPECfp_base2000921 SPECfp2000 939 NOTES - Base flags: -O2 -msse -msse2 -mfpmath=sse Peak flags: -O2 -msse -msse2 -mfpmath=sse -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From dalej at gcc dot gnu dot org 2005-08-02 22:57 --- Preceding patch fixes the ICE I was getting. The tests following the modified area in find_reloads were being skipped in cases where they weren't before (in particular, when output reloads are not allowed, this was not detected). The revised patch moves the area Paolo modified below those tests. Going through more testing now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From dalej at gcc dot gnu dot org 2005-08-01 20:56 --- Unfortunately the latest version of this patch causes a bootstrap failure on ppc: ../../gcc3.apple.200502/gcc/reload.c: In function 'find_reloads': ../../gcc3.apple.200502/gcc/reload.c:4512: internal compiler error: in do_output_ reload, at reload1.c:6936 which is /* If is a JUMP_INSN, we can't support output reloads yet. */ gcc_assert (!JUMP_P (insn)); Digging further. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
-- What|Removed |Added CC||dalej at apple dot com AssignedTo|unassigned at gcc dot gnu |bonzini at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2005-06-19 14:58:01 |2005-07-27 15:53:05 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From bonzini at gcc dot gnu dot org 2005-07-14 08:45 --- > But I don't see immediately how reload could be convinced to do so > automatically, as the choice of the reload class for one insn is independend > from the choices of reloads for the same reg but in other insns. We can use PREFERRED_RELOAD_CLASS and PREFERRED_OUTPUT_RELOAD_CLASS. I am not sure if the fix is over-eager, but it works great on libsse2. Every fld that was there disappears, and from a cursory check, in current mainline's code all of them could have been inherited. Patches are at: - http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00914.html (regclass) - http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00983.html (reload+i386) It is also necessary to remove from the MD the `#' hints for regclass, with sed -i 's/#[^",\][^",]*\([",]\)/\1/g' i386.md Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From bonzini at gcc dot gnu dot org 2005-07-13 14:23 --- Michael, thank you very much. Your analysis will probably help a lot. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From bonzini at gcc dot gnu dot org 2005-07-13 14:08 --- http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01783.html crashes on richard guenther's libsse2, fwiw. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From matz at suse dot de 2005-07-13 13:55 --- I was going to add this text to PR22453, when I noticed that it was closed as duplicate to this one. So putting it here for reference, although everything seems to be analyzed already: The reload happens, because reg 58 gets no hardreg, because it's live over a call, and it's not worthwhile to put it into a call clobbered reg (which SSE regs are). So reg 58 is placed onto stack (at ebp+16). Now this mem must be initialized with 1.0. If that is done via x87 (fld1 , fst ebp+16), via GENERAL_REGS (mov 1.0 -> (reg:DF ax) , mov (reg:DF ax) -> (ebp+16)), or via SSE_REGS (movsd (mem 1.0) -> xmm0 , mov xmm0 -> (ebp+16)) is actually not that important. You won't get rid of this reload. Except that _if_ you force it to use SSE_REGS, then the next reload from (ebp+16) for the next insn can be inherited (as it's then the same mode), hence the initial store to ebp+16 is useless and will be removed. This can be tested with this hack: --- i386.md 12 Jul 2005 09:20:12 - 1.645 +++ i386.md 13 Jul 2005 13:47:06 - @@ -2417,9 +2417,9 @@ (define_insn "*movdf_nointeger" [(set (match_operand:DF 0 "nonimmediate_operand" - "=f#Y,m ,f#Y,*r ,o ,Y*x#f,Y*x#f,Y*x#f ,m") + "=?f#Y,m ,f#Y,*?r ,o ,Y*x#f,Y*x#f,Y*x#f ,m") (match_operand:DF 1 "general_operand" - "fm#Y,f#Y,G ,*roF,F*r,C,Y*x#f,HmY*x#f,Y*x#f"))] + "?fm#Y,f#Y,G ,*?roF,F*r,C,Y*x#f,HmY*x#f,Y*x#f"))] "(GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM) && ((optimize_size || !TARGET_INTEGER_DFMODE_MOVES) && !TARGET_64BIT) && (reload_in_progress || reload_completed But I don't see immediately how reload could be convinced to do so automatically, as the choice of the reload class for one insn is independend from the choices of reloads for the same reg but in other insns. -- What|Removed |Added CC||matz at suse dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From bonzini at gcc dot gnu dot org 2005-07-13 11:51 --- Smaller testcase from PR22453: extern double f(double, double); void g (double x) { double z, y; z = 0.0; y = 1.0 - x; again: z = y - z; f(z, 1.0); if (z == 0.0) goto again; } has a fld1 instruction when compiled with "-mfpmath=sse -msse2 -msseregparm -mtune=pentiumpro -O2". This instruction is caused by a reload into a FLOAT_REGS register, and moving the value to a SSE register needs secondary memory. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-07-13 11:30 --- *** Bug 22453 has been marked as a duplicate of this bug. *** -- What|Removed |Added CC||bonzini at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-06-19 14:58 --- Confirmed. -- What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed||1 GCC build triplet|i686-pc-linux-gnu | GCC host triplet|i686-pc-linux-gnu | Last reconfirmed|-00-00 00:00:00 |2005-06-19 14:58:01 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From uros at kss-loka dot si 2005-02-09 07:30 --- (In reply to comment #4) > rth hacked the constraints recently to have better ra for some fp cases. Can > you see if the bug is still there today on mainline? gcc version 4.0.0 20050209 (experimental) '-O2 -march=pentium4 -mfpmath=sse -ffast-math -D__NO_MATH_INLINES' ... comisd .LC2, %xmm0 jb .L2 fld1<- load of 1 into SSE reg fstpl -112(%ebp) movsd -112(%ebp), %xmm2 fldz<- load of 0 into SSE reg fstl-112(%ebp) movsd -112(%ebp), %xmm1 movapd %xmm1, %xmm0 .L4: fldl32(%ebx) movsd 32(%ebx), %xmm3 fstpl -96(%ebp) movsd %xmm3, -96(%ebp) movsd -96(%ebp), %xmm3 [ -not needed- ] mulsd %xmm2, %xmm3 subsd %xmm0, %xmm3 fldl24(%ebx)<-move to temporary, this is OK fstpl -88(%ebp) mulsd -88(%ebp), %xmm2 xorpd .LC4, %xmm2 movsd -88(%ebp), %xmm4 mulsd %xmm1, %xmm4 ... The bug is still there (it manifests itself also in a couple of other places). Could somebody confirm this bug? A testcase is attached. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-08 10:13 --- rth hacked the constraints recently to have better ra for some fp cases. Can you see if the bug is still there today on mainline? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From uros at kss-loka dot si 2005-01-28 06:23 --- BTW: I don't think that x87 should be fully disabled for -mfpmath=sse. st(0) can be used as a temporary storage for memory-to-memory transfers. Also, it can do on-the-fly FP extending and truncating, without touching a SSE reg: movsd (%eax),xmm1 # ~7 cycles cvtsd2ss xmm1,(%esp) # 14 cycles could be implemented by: fldl (%eax) # ~7cycles fstps (%esp) # ~7cycles There is nothing wrong, if fldl is replaced with fldz or fld1. The performance problems will arise in case when memory location is used in subsequent SSE computations. In this case, it would be better if zero is "generated" in SSE register and stored to memory from SSE reg. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
-- What|Removed |Added Severity|normal |minor Keywords||missed-optimization, ra http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From uros at kss-loka dot si 2005-01-27 09:14 --- A patch (RFC actually) that was used to teach allocator which register set to use: http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01783.html [This patch probably doesn't work well with xmmintrin.h stuff as pointed by rth in follow-up comment.] -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From uros at kss-loka dot si 2005-01-27 08:35 --- Created an attachment (id=8080) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8080&action=view) Self-contained example Self-contained example -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653