[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-09 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

--- Comment #70 from Jakub Jelinek  2010-12-09 
08:33:49 UTC ---
Author: jakub
Date: Thu Dec  9 08:33:45 2010
New Revision: 167629

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167629
Log:
PR target/41082
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Use stvx
instead of stve*x.
(altivec_expand_stv_builtin): For op0 use mode of operand 1 instead
of operand 0.
* config/rs6000/altivec.md (VI_scalar): New mode attr.
(altivec_stvex, *altivec_stvesfx): Use scalar instead of
vector mode for operand 0, put operand 1 into UNSPEC.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/altivec.md
trunk/gcc/config/rs6000/rs6000.c


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-08 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

--- Comment #69 from Dominique d'Humieres  
2010-12-09 06:15:02 UTC ---
With gcc46-pr41082.patch, the test passes on darwin with both -mtune=rs64 and
-mtune=power4.

Thanks.


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-08 Thread meissner at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

--- Comment #68 from Michael Meissner  2010-12-08 
20:29:45 UTC ---
gcc46-pr41082.patch looks correct to me.  I did a build on a linux power7
system, and saw no regressions in the make check output.


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-08 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

--- Comment #67 from Jakub Jelinek  2010-12-08 
08:28:42 UTC ---
Perhaps it would be also good to add new peephole2 to catch:
(insn 931 415 932 33 (set (reg:CC 19 r19)
(mem/c:CC (plus:DI (reg/f:DI 1 r1)
(const_int 272 [0x110])) [5 %sfp+272 S4 A32])) where_2.f90:11
358 {*movcc_internal1}
 (nil))

(insn 932 931 461 33 (set (reg:CC 74 cr6)
(reg:CC 19 r19)) where_2.f90:11 358 {*movcc_internal1}
 (expr_list:REG_DEAD (reg:CC 19 r19)
(nil)))

(insn 461 932 422 33 (set (reg:SI 27 r27 [712])
(gt:SI (reg:CC 74 cr6)
(const_int 0 [0]))) where_2.f90:11 462 {*rs6000.md:13486}
 (expr_list:REG_DEAD (reg:CC 74 cr6)
(nil)))

which is expanded to (if -fno-schedule-insns2, but peephole2 is run before
second scheduling):
lwz r19,272(r1)
rlwinm r19,r19,8,0x
mtcrf 2,r19
rlwinm r19,r19,24,0x
mfcr r27
rlwinm r27,r27,26,1
while only one lwz and one rlwinm are actually needed (BTW, also it would be
nice to avoid the second rlwinm in movcc_internal1 pattern if the source
integer register is dead at the insn).

I guess this can happen quite often, any time the register pressure is too high
and reload spills CC mode registers and then they are used just once for cr*
cond 0 ? 1 : 0 assignments.


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-07 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #22678|0   |1
is obsolete||
  Attachment #22679|0   |1
is obsolete||

--- Comment #66 from Jakub Jelinek  2010-12-08 
07:35:43 UTC ---
Created attachment 22680
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22680
gcc46-pr41082.patch

Another untested fix, which this time should fix both
rs6000_expand_vector_extract patterns and __builtin_altivec_stve*x.
For altivec-4.c it generates identical code before/after the patch for both -O0
and -O2.


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-07 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

--- Comment #65 from Jakub Jelinek  2010-12-08 
00:32:47 UTC ---
Created attachment 22679
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22679
gcc46-pr41082.patch

Found that now too.  Anyway, I believe (if there is no performance issue) I can
just tweak rs6000_expand_vector_extract this way, and the stve*x patterns would
need to be fixed anyway, though, because it hardly can have the extra argument,
it couldn't be VEC_SELECT, but I guess a scalar store with =Z or some similar
constraint that forces reg or reg+reg, with source being jus tthe unspec
UNSPEC_STVE with the vector as argument thereof.


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-07 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

--- Comment #64 from Andrew Pinski  2010-12-08 
00:15:44 UTC ---
> IMHO we should just get rid of UNSPEC_STVE stuff and store the whole vector, 

No you cannot because there are builtins which create the UNSPEC_STVE.


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-07 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

--- Comment #63 from Jakub Jelinek  2010-12-08 
00:12:52 UTC ---
Created attachment 22678
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22678
gcc46-pr41082.patch

Totally untested proof of concept patch.
The disadvantage is that as the MEM mode is not altivec-ish, it isn't forced
into reg+reg addressing early.

On the other side, when rs6000_expand_vector_extract always creates a new stack
local (shouldn't it try to share just one such slot for each mode in each
function btw?), is there any reason why a normal stvx insn can't be used
instead of these stve*x insns?  Is it a performance issue?  The difference
between stvx and stve*x I understand is just that stve*x doesn't clobber in the
memory
other bytes, while stvx stores everything in the 16 byte slot.  But we don't
care about those other bytes anyway, so if it is not a performance issue, IMHO
we should just get rid of UNSPEC_STVE stuff and store the whole vector, then
just read the bytes we want.


[Bug target/41082] [4.5/4.6 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3

2010-12-07 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

Jakub Jelinek  changed:

   What|Removed |Added

  Component|middle-end  |target

--- Comment #62 from Jakub Jelinek  2010-12-07 
23:18:05 UTC ---
On a closer look, the reason why DSE deletes it is because there is a stvewx
insn in between, and the pattern of the insn is just plain wrong.
rs6000_expand_vector_extract has called assign_stack_temp, which gave for a
V4SImode something that is at r1+256, 16 bytes.
But rs6000_expand_vector_extract adjusts the address by elt * 4 bytes, here for
elt 3, so it is r1+268.  The pattern wrongly says that a V4SImode register is
stored into (mem:V4SI (r1 + 268)), which is not true (the insn stores just
4 bytes, i.e. SImode, into r1 + 268.  The r1+272 address which was given for
one of the spilled CCmode pseudos is then considered to be clobbered by DSE,
because
(mem:V4SI (r1 + 268)) overlaps it, thus the removed (mem:CC (r1 + 272)) store
by DSE.

I think the stve* insns need to be represented as what it really does, i.e.
(set (mem:SI ...) (either vec_select or perhaps unspec with the V4SImode reg
inside of it)).