[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 --- Comment #70 from Jakub Jelinek 2010-12-09 08:33:49 UTC --- Author: jakub Date: Thu Dec 9 08:33:45 2010 New Revision: 167629 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167629 Log: PR target/41082 * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Use stvx instead of stve*x. (altivec_expand_stv_builtin): For op0 use mode of operand 1 instead of operand 0. * config/rs6000/altivec.md (VI_scalar): New mode attr. (altivec_stvex, *altivec_stvesfx): Use scalar instead of vector mode for operand 0, put operand 1 into UNSPEC. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/altivec.md trunk/gcc/config/rs6000/rs6000.c
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 --- Comment #69 from Dominique d'Humieres 2010-12-09 06:15:02 UTC --- With gcc46-pr41082.patch, the test passes on darwin with both -mtune=rs64 and -mtune=power4. Thanks.
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 --- Comment #68 from Michael Meissner 2010-12-08 20:29:45 UTC --- gcc46-pr41082.patch looks correct to me. I did a build on a linux power7 system, and saw no regressions in the make check output.
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 --- Comment #67 from Jakub Jelinek 2010-12-08 08:28:42 UTC --- Perhaps it would be also good to add new peephole2 to catch: (insn 931 415 932 33 (set (reg:CC 19 r19) (mem/c:CC (plus:DI (reg/f:DI 1 r1) (const_int 272 [0x110])) [5 %sfp+272 S4 A32])) where_2.f90:11 358 {*movcc_internal1} (nil)) (insn 932 931 461 33 (set (reg:CC 74 cr6) (reg:CC 19 r19)) where_2.f90:11 358 {*movcc_internal1} (expr_list:REG_DEAD (reg:CC 19 r19) (nil))) (insn 461 932 422 33 (set (reg:SI 27 r27 [712]) (gt:SI (reg:CC 74 cr6) (const_int 0 [0]))) where_2.f90:11 462 {*rs6000.md:13486} (expr_list:REG_DEAD (reg:CC 74 cr6) (nil))) which is expanded to (if -fno-schedule-insns2, but peephole2 is run before second scheduling): lwz r19,272(r1) rlwinm r19,r19,8,0x mtcrf 2,r19 rlwinm r19,r19,24,0x mfcr r27 rlwinm r27,r27,26,1 while only one lwz and one rlwinm are actually needed (BTW, also it would be nice to avoid the second rlwinm in movcc_internal1 pattern if the source integer register is dead at the insn). I guess this can happen quite often, any time the register pressure is too high and reload spills CC mode registers and then they are used just once for cr* cond 0 ? 1 : 0 assignments.
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 Jakub Jelinek changed: What|Removed |Added Attachment #22678|0 |1 is obsolete|| Attachment #22679|0 |1 is obsolete|| --- Comment #66 from Jakub Jelinek 2010-12-08 07:35:43 UTC --- Created attachment 22680 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22680 gcc46-pr41082.patch Another untested fix, which this time should fix both rs6000_expand_vector_extract patterns and __builtin_altivec_stve*x. For altivec-4.c it generates identical code before/after the patch for both -O0 and -O2.
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 --- Comment #65 from Jakub Jelinek 2010-12-08 00:32:47 UTC --- Created attachment 22679 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22679 gcc46-pr41082.patch Found that now too. Anyway, I believe (if there is no performance issue) I can just tweak rs6000_expand_vector_extract this way, and the stve*x patterns would need to be fixed anyway, though, because it hardly can have the extra argument, it couldn't be VEC_SELECT, but I guess a scalar store with =Z or some similar constraint that forces reg or reg+reg, with source being jus tthe unspec UNSPEC_STVE with the vector as argument thereof.
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 --- Comment #64 from Andrew Pinski 2010-12-08 00:15:44 UTC --- > IMHO we should just get rid of UNSPEC_STVE stuff and store the whole vector, No you cannot because there are builtins which create the UNSPEC_STVE.
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 --- Comment #63 from Jakub Jelinek 2010-12-08 00:12:52 UTC --- Created attachment 22678 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22678 gcc46-pr41082.patch Totally untested proof of concept patch. The disadvantage is that as the MEM mode is not altivec-ish, it isn't forced into reg+reg addressing early. On the other side, when rs6000_expand_vector_extract always creates a new stack local (shouldn't it try to share just one such slot for each mode in each function btw?), is there any reason why a normal stvx insn can't be used instead of these stve*x insns? Is it a performance issue? The difference between stvx and stve*x I understand is just that stve*x doesn't clobber in the memory other bytes, while stvx stores everything in the 16 byte slot. But we don't care about those other bytes anyway, so if it is not a performance issue, IMHO we should just get rid of UNSPEC_STVE stuff and store the whole vector, then just read the bytes we want.
[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082 Jakub Jelinek changed: What|Removed |Added Component|middle-end |target --- Comment #62 from Jakub Jelinek 2010-12-07 23:18:05 UTC --- On a closer look, the reason why DSE deletes it is because there is a stvewx insn in between, and the pattern of the insn is just plain wrong. rs6000_expand_vector_extract has called assign_stack_temp, which gave for a V4SImode something that is at r1+256, 16 bytes. But rs6000_expand_vector_extract adjusts the address by elt * 4 bytes, here for elt 3, so it is r1+268. The pattern wrongly says that a V4SImode register is stored into (mem:V4SI (r1 + 268)), which is not true (the insn stores just 4 bytes, i.e. SImode, into r1 + 268. The r1+272 address which was given for one of the spilled CCmode pseudos is then considered to be clobbered by DSE, because (mem:V4SI (r1 + 268)) overlaps it, thus the removed (mem:CC (r1 + 272)) store by DSE. I think the stve* insns need to be represented as what it really does, i.e. (set (mem:SI ...) (either vec_select or perhaps unspec with the V4SImode reg inside of it)).