[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 Bill Seurer changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #15 from Bill Seurer --- This works fine now. Thanks!
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 --- Comment #12 from Richard Biener --- Ok, it became latent only. Testing the following fix: Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 230855) +++ gcc/tree-vect-slp.c (working copy) @@ -1078,6 +1078,20 @@ vect_build_slp_tree (vec_info *vinfo, tem, npermutes, _tree_size, max_tree_size)) { + /* ... so if successful we can apply the operand swapping +to the GIMPLE IL. This is necessary because for example +vect_get_slp_defs uses operand indexes and thus expects +canonical operand order. This is also necessary even +if we end up building the operand from scalars as +we'll continue to process swapped operand two. */ + for (j = 0; j < group_size; ++j) + if (!matches[j]) + { + gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j]; + swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt), + gimple_assign_rhs2_ptr (stmt)); + } + /* If we have all children of child built up from scalars then just throw that away and build it up this node from scalars. */ if (!SLP_TREE_CHILDREN (child).is_empty ()) @@ -1107,17 +1121,6 @@ vect_build_slp_tree (vec_info *vinfo, } } - /* ... so if successful we can apply the operand swapping -to the GIMPLE IL. This is necessary because for example -vect_get_slp_defs uses operand indexes and thus expects -canonical operand order. */ - for (j = 0; j < group_size; ++j) - if (!matches[j]) - { - gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j]; - swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt), - gimple_assign_rhs2_ptr (stmt)); - } oprnd_info->def_stmts = vNULL; SLP_TREE_CHILDREN (*node).quick_push (child); continue;
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 --- Comment #11 from Richard Biener --- Note that I can't reproduce the bogus dump with current trunk.
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #14 from Richard Biener --- Fixed.
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 --- Comment #13 from Richard Biener --- Author: rguenth Date: Mon Dec 14 13:42:03 2015 New Revision: 231617 URL: https://gcc.gnu.org/viewcvs?rev=231617=gcc=rev Log: 2015-12-14 Richard BienerPR tree-optimization/68775 * tree-vect-slp.c (vect_build_slp_tree): Make sure to apply a operand swapping even if replacing the op with scalars. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-slp.c
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 --- Comment #8 from Richard Biener --- Ok, it looks like a ppc64le cross happily (eh) accepts sources preprocessed on x86_64-linux and even required built modules. So I have the dump files myself and the -fopt-info-vec difference (for BB vectorization only) is empty. It looks like the only code difference is for the vectorization of the BB in loop shell2.fppized.f90:971 that is do k = 1, k_max k1 = k_x(k);k2 = k_y(k);k3 = k_z(k) dot1 = k1*P1+k2*P2+k3*P3 dot2 = g4 * (k1*k1+k2*k2+k3*k3) res_ij(k) = res_ij(k) + therm(k) * (fac1 * exp(cmplx(dot2,dot1,kind=kind((1.0d0,1.0d0) end do which has now one less vector operand. If you can confirm this by "bisecting" the file with -fdbg-cnt=vect_slp:N that would be nice. The vectorized code looks ok to me so I suspect a target issue here. Note that we do both a vector load from the realpart of a complex and a scalar load of the imaginary part and then use that to construct another vector: _1371 = REALPART_EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>; vectp.6451_10558 = _EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>; vect__1371.6452_10556 = MEM[(real(kind=8) *)vectp.6451_10558]; _395 = IMAGPART_EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>; [shell2.fppized.f90:975:0] _177 = _964 * _3980; [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395}; [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 + vect__1371.6452_10556; [shell2.fppized.f90:975:0] _389 = _395 + _3549; vectp.6455_5409 = _EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>; [shell2.fppized.f90:975:0] MEM[(real(kind=8) *)vectp.6455_5409] = vect__455.6453_5427; in .optimized the above looks like vect__1371.6452_10556 = MEM[base: _9159, offset: 0B]; _395 = MEM[base: _9159, offset: 8B]; _9158 = (void *) ivtmp.7110_9170; [shell2.fppized.f90:975:0] _964 = MEM[base: _9158, offset: 0B]; [shell2.fppized.f90:975:0] _486 = __builtin_exp (dot2_958); [shell2.fppized.f90:975:0] _508 = REALPART_EXPR ; _1815 = _486 * fac1$real_1370; [shell2.fppized.f90:975:0] _518 = IMAGPART_EXPR ; [shell2.fppized.f90:975:0] _178 = _508 * _1815; [shell2.fppized.f90:975:0] _201 = _518 * _1815; [shell2.fppized.f90:975:0] _967 = COMPLEX_EXPR <_178, _201>; [shell2.fppized.f90:975:0] _968 = ((_967)); _3980 = REALPART_EXPR <_968>; [shell2.fppized.f90:975:0] _177 = _964 * _3980; [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395}; [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 + vect__1371.6452_10556; [shell2.fppized.f90:975:0] MEM[base: _9159, offset: 0B] = vect__455.6453_5427; which might be enough to trigger later RTL opt confusion. I can just guess at something CSEing the scalar load with the vector load and getting lane ordering (endianess) wrong. Maybe you can extract a small testcase from the above info that reproduces the difference.
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 --- Comment #7 from Richard Biener --- Created attachment 36993 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36993=edit candidate patch Can you try if the fix for PR68852 (attached) fixes this? If not, can you attach -fdump-tree-slp-details (two dump files) before and after the rev. for that file?
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 --- Comment #10 from William Seurer --- It fails with -fdbg-cnt=vect_slp:31 and succeeds with -fdbg-cnt=vect_slp:30
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 Richard Biener changed: What|Removed |Added Target Milestone|--- |6.0 Summary|spec2006 test case |[6 Regression] spec2006 |465.tonto fails with the|test case 465.tonto fails |gcc 6.0 fortran compiler|with the gcc 6.0 fortran ||compiler
[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2015-12-11 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #9 from Richard Biener --- So before the patch: _9157 = (void *) ivtmp.7110_9173; vect__1371.6453_5409 = MEM[base: _9157, offset: 0B]; _9156 = (void *) ivtmp.7111_9168; [shell2.fppized.f90:975:0] _964 = MEM[base: _9156, offset: 0B]; ... _3980 = REALPART_EXPR <_968>; _390 = IMAGPART_EXPR <_968>; [shell2.fppized.f90:975:0] vect_cst__10558 = {_964, _964}; [shell2.fppized.f90:975:0] vect_cst__10556 = {_3980, _390}; [shell2.fppized.f90:975:0] vect__455.6454_10315 = vect_cst__10556 * vect_cst__10558 + vect__1371.6453_5409; [shell2.fppized.f90:975:0] MEM[base: _9157, offset: 0B] = vect__455.6454_10315; after it: _9159 = (void *) ivtmp.7109_9175; vect__1371.6452_10556 = MEM[base: _9159, offset: 0B]; _395 = MEM[base: _9159, offset: 8B]; _9158 = (void *) ivtmp.7110_9170; [shell2.fppized.f90:975:0] _964 = MEM[base: _9158, offset: 0B]; ... _3980 = REALPART_EXPR <_968>; [shell2.fppized.f90:975:0] _177 = _964 * _3980; [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395}; [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 + vect__1371.6452_10556; [shell2.fppized.f90:975:0] MEM[base: _9159, offset: 0B] = vect__455.6453_5427; somehow the IMAGPART <_968> * _964 got "lost" and replaced by a load from _395. So a bug in the vectorizer after all. More next week ...