[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-15 Thread seurer at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

Bill Seurer  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #15 from Bill Seurer  ---
This works fine now.  Thanks!

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

--- Comment #12 from Richard Biener  ---
Ok, it became latent only.  Testing the following fix:

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 230855)
+++ gcc/tree-vect-slp.c (working copy)
@@ -1078,6 +1078,20 @@ vect_build_slp_tree (vec_info *vinfo,
   tem, npermutes, _tree_size,
   max_tree_size))
{
+ /* ... so if successful we can apply the operand swapping
+to the GIMPLE IL.  This is necessary because for example
+vect_get_slp_defs uses operand indexes and thus expects
+canonical operand order.  This is also necessary even
+if we end up building the operand from scalars as
+we'll continue to process swapped operand two.  */
+ for (j = 0; j < group_size; ++j)
+   if (!matches[j])
+ {
+   gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
+   swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
+  gimple_assign_rhs2_ptr (stmt));
+ }
+
  /* If we have all children of child built up from scalars then
 just throw that away and build it up this node from scalars. 
*/
  if (!SLP_TREE_CHILDREN (child).is_empty ())
@@ -1107,17 +1121,6 @@ vect_build_slp_tree (vec_info *vinfo,
}
}

- /* ... so if successful we can apply the operand swapping
-to the GIMPLE IL.  This is necessary because for example
-vect_get_slp_defs uses operand indexes and thus expects
-canonical operand order.  */
- for (j = 0; j < group_size; ++j)
-   if (!matches[j])
- {
-   gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
-   swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
-  gimple_assign_rhs2_ptr (stmt));
- }
  oprnd_info->def_stmts = vNULL;
  SLP_TREE_CHILDREN (*node).quick_push (child);
  continue;

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

--- Comment #11 from Richard Biener  ---
Note that I can't reproduce the bogus dump with current trunk.

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Biener  ---
Fixed.

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

--- Comment #13 from Richard Biener  ---
Author: rguenth
Date: Mon Dec 14 13:42:03 2015
New Revision: 231617

URL: https://gcc.gnu.org/viewcvs?rev=231617=gcc=rev
Log:
2015-12-14  Richard Biener  

PR tree-optimization/68775
* tree-vect-slp.c (vect_build_slp_tree): Make sure to apply
a operand swapping even if replacing the op with scalars.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-slp.c

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

--- Comment #8 from Richard Biener  ---
Ok, it looks like a ppc64le cross happily (eh) accepts sources preprocessed on
x86_64-linux and even required built modules.

So I have the dump files myself and the -fopt-info-vec difference (for BB
vectorization only) is empty.  It looks like the only code difference is
for the vectorization of the BB in loop

shell2.fppized.f90:971

that is

  do k = 1, k_max
k1 = k_x(k);k2 = k_y(k);k3 = k_z(k)
dot1 = k1*P1+k2*P2+k3*P3
dot2 = g4 * (k1*k1+k2*k2+k3*k3)
res_ij(k) = res_ij(k) + therm(k) * (fac1 *
exp(cmplx(dot2,dot1,kind=kind((1.0d0,1.0d0)
  end do

which has now one less vector operand.

If you can confirm this by "bisecting" the file with -fdbg-cnt=vect_slp:N
that would be nice.

The vectorized code looks ok to me so I suspect a target issue here.

Note that we do both a vector load from the realpart of a complex and
a scalar load of the imaginary part and then use that to construct
another vector:

  _1371 = REALPART_EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0]
MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;
  vectp.6451_10558 = _EXPR <[shell2.fppized.f90:975:0]
[shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;
  vect__1371.6452_10556 = MEM[(real(kind=8) *)vectp.6451_10558];
  _395 = IMAGPART_EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0]
MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;

  [shell2.fppized.f90:975:0] _177 = _964 * _3980;
  [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395};
  [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 +
vect__1371.6452_10556;
  [shell2.fppized.f90:975:0] _389 = _395 + _3549;
  vectp.6455_5409 = _EXPR <[shell2.fppized.f90:975:0]
[shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;
  [shell2.fppized.f90:975:0] MEM[(real(kind=8) *)vectp.6455_5409] =
vect__455.6453_5427;

in .optimized the above looks like

  vect__1371.6452_10556 = MEM[base: _9159, offset: 0B];
  _395 = MEM[base: _9159, offset: 8B];
  _9158 = (void *) ivtmp.7110_9170;
  [shell2.fppized.f90:975:0] _964 = MEM[base: _9158, offset: 0B];
  [shell2.fppized.f90:975:0] _486 = __builtin_exp (dot2_958);
  [shell2.fppized.f90:975:0] _508 = REALPART_EXPR ;
  _1815 = _486 * fac1$real_1370;
  [shell2.fppized.f90:975:0] _518 = IMAGPART_EXPR ;
  [shell2.fppized.f90:975:0] _178 = _508 * _1815;
  [shell2.fppized.f90:975:0] _201 = _518 * _1815;
  [shell2.fppized.f90:975:0] _967 = COMPLEX_EXPR <_178, _201>;
  [shell2.fppized.f90:975:0] _968 = ((_967));
  _3980 = REALPART_EXPR <_968>;
  [shell2.fppized.f90:975:0] _177 = _964 * _3980;
  [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395};
  [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 +
vect__1371.6452_10556;
  [shell2.fppized.f90:975:0] MEM[base: _9159, offset: 0B] =
vect__455.6453_5427;

which might be enough to trigger later RTL opt confusion.  I can just guess
at something CSEing the scalar load with the vector load and getting
lane ordering (endianess) wrong.

Maybe you can extract a small testcase from the above info that reproduces
the difference.

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

--- Comment #7 from Richard Biener  ---
Created attachment 36993
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36993=edit
candidate patch

Can you try if the fix for PR68852 (attached) fixes this?

If not, can you attach -fdump-tree-slp-details (two dump files) before and
after the rev. for that file?

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-11 Thread seurer at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

--- Comment #10 from William Seurer  ---
It fails with -fdbg-cnt=vect_slp:31 and succeeds with -fdbg-cnt=vect_slp:30

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0
Summary|spec2006 test case  |[6 Regression] spec2006
   |465.tonto fails with the|test case 465.tonto fails
   |gcc 6.0 fortran compiler|with the gcc 6.0 fortran
   ||compiler

[Bug tree-optimization/68775] [6 Regression] spec2006 test case 465.tonto fails with the gcc 6.0 fortran compiler

2015-12-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-12-11
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #9 from Richard Biener  ---
So before the patch:

  _9157 = (void *) ivtmp.7110_9173;
  vect__1371.6453_5409 = MEM[base: _9157, offset: 0B];
  _9156 = (void *) ivtmp.7111_9168;
  [shell2.fppized.f90:975:0] _964 = MEM[base: _9156, offset: 0B];
...
  _3980 = REALPART_EXPR <_968>;
  _390 = IMAGPART_EXPR <_968>;
  [shell2.fppized.f90:975:0] vect_cst__10558 = {_964, _964};
  [shell2.fppized.f90:975:0] vect_cst__10556 = {_3980, _390};
  [shell2.fppized.f90:975:0] vect__455.6454_10315 = vect_cst__10556 *
vect_cst__10558 + vect__1371.6453_5409;
  [shell2.fppized.f90:975:0] MEM[base: _9157, offset: 0B] =
vect__455.6454_10315;

after it:

  _9159 = (void *) ivtmp.7109_9175;
  vect__1371.6452_10556 = MEM[base: _9159, offset: 0B];
  _395 = MEM[base: _9159, offset: 8B];
  _9158 = (void *) ivtmp.7110_9170;
  [shell2.fppized.f90:975:0] _964 = MEM[base: _9158, offset: 0B];
...
  _3980 = REALPART_EXPR <_968>;
  [shell2.fppized.f90:975:0] _177 = _964 * _3980;
  [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395};
  [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 +
vect__1371.6452_10556;
  [shell2.fppized.f90:975:0] MEM[base: _9159, offset: 0B] =
vect__455.6453_5427;

somehow the IMAGPART <_968> * _964 got "lost" and replaced by a load from _395.

So a bug in the vectorizer after all.  More next week ...