[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2021-05-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
Bug 66142 depends on bug 66379, which changed state.

Bug 66379 Summary: SCCVN doesn't handle aggregate array element accesses very 
well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66379

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2016-03-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #27 from Yuri Rumyantsev  ---
Created attachment 37940
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37940=edit
test-case to reproduce

Need to be compiled with -Ofast -mavx2 -fopenmp options.

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2016-03-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #26 from Yuri Rumyantsev  ---
If we convert copy structures to copy structure fields test will be vectorized
and all mentions of GOMP_SIMD_LANE will be deleted. But if we slightly modify
test by introducing new function vdot and insert its call:
   b = r.x * ray->dir.x + r.y * ray->dir.y;
 |
 v
   b = vdot (r, ray->dir);
test won't be vectorized:
test2.cpp:70:9: note: not vectorized: not suitable for scatter store
D.6062[_9].org.x = 1.0e+0;

test2.cpp is attached.

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-12-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #23 from Yuri Rumyantsev  ---
Richard,

Do we have any chance to vectorize attached test-case using GCC6 compiler?

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-12-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #24 from Yuri Rumyantsev  ---
Richard,

Do we have any chance to vectorize attached test-case using GCC6 compiler?

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-12-08 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #25 from rguenther at suse dot de  ---
On Tue, 8 Dec 2015, ysrumyan at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
> 
> --- Comment #23 from Yuri Rumyantsev  ---
> Richard,
> 
> Do we have any chance to vectorize attached test-case using GCC6 compiler?

No, I don't see any.

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-09-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #22 from Richard Biener  ---
Author: rguenth
Date: Fri Sep 18 07:57:00 2015
New Revision: 227896

URL: https://gcc.gnu.org/viewcvs?rev=227896=gcc=rev
Log:
2015-09-18  Richard Biener  

PR tree-optimization/66142
* fold-const.c (operand_equal_p): When OEP_ADDRESS_OF
treat MEM[] and x the same.
* tree-ssa-sccvn.h (vn_reference_fold_indirect): Remove.
* tree-ssa-sccvn.c (vn_reference_fold_indirect): Return true
when we simplified sth.
(vn_reference_maybe_forwprop_address): Likewise.
(valueize_refs_1): When we simplified through
vn_reference_fold_indirect or vn_reference_maybe_forwprop_address
set valueized_anything to true.
(vn_reference_lookup_3): Use stmt_kills_ref_p to see whether
one ref kills the other instead of just a offset-based test.
* tree-ssa-alias.c (stmt_kills_ref_p): Use OEP_ADDRESS_OF
for the operand_equal_p test to compare bases and also compare
sizes.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
trunk/gcc/tree-ssa-alias.c
trunk/gcc/tree-ssa-sccvn.c
trunk/gcc/tree-ssa-sccvn.h


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-09-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #21 from Richard Biener  ---
Ok, so if we fix SCCVN further we hit its (designed) limitation with stmt
walking not following backedges (but the use is in a loop and the def outside).
We avoid doing the work to determine if sth is loop invariant here.
This was trying to improve things with SRA disabled - I'm going to still push
those changes if testing works out fine.

Now with SRA enabled we have a better situation in this regard but face

  _9 = [_12].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
  D.4309[_12].dir.x = x_13;
  D.4309[_12].dir.y = y_14;
  D.4308[_12].t = 1.0e+10;
  D.4308[_12].h = 0;
  _20 = [_12];
  _26 = MEM[(const struct Ray *)][_12].org.x;

which as said isn't friendly enough for the simplistic address forward
propagation we have.  The issue here is that we can't disambiguate
against

  MEM[(struct vec_ *)_9 + 4B] = _8;

when looking up MEM[(const struct Ray *)][_12].org.x; because
vn_reference_maybe_forwprop_address doesn't forward into MEM[(struct vec_ *)_9
+ 4B] as its offset is not zero:

  /* If that didn't work because the address isn't invariant propagate
 the reference tree from the address operation in case the current
 dereference isn't offsetted.  */
  if (!addr_base
  && *i_p == ops->length () - 1
  && off == 0
  /* This makes us disable this transform for PRE where the
 reference ops might be also used for code insertion which
 is invalid.  */
  && default_vn_walk_kind == VN_WALKREWRITE)
{
  auto_vec tem;
  copy_reference_ops_from_ref (TREE_OPERAND (addr, 0), );
  ops->pop ();
  ops->pop ();
  ops->safe_splice (tem);
  --*i_p;
  return true;

it also likely wouldn't help vn_reference_lookup_3 to do the disambiguation
as that performs just

  lhs_ops = valueize_refs_1 (lhs_ops, _anything);
  if (valueized_anything)
{
  lhs_ref_ok = ao_ref_init_from_vn_reference (_ref,
  get_alias_set (lhs),
  TREE_TYPE (lhs),
lhs_ops);
  if (lhs_ref_ok
  && !refs_may_alias_p_1 (ref, _ref, true))
return NULL;

but there is still the variable offset to cope with.  We'd need to perform
some disambiguation on lhs_ref vs. vr.

But it's not clear how to represent [_12].org forwarded into
MEM[(struct vec_ *)_9 + 4B].  It's kind-of a COMPONENT_REF with just
a offset added.

Let's collect the improvements I have sofar in my local tree and see if they
otherwise work.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-09-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #20 from Richard Biener  ---
Yes, I plan to come back to this during stage3.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-09-07 Thread izamyatin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Igor Zamyatin  changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #19 from Igor Zamyatin  ---
Richard, would you be able to look at this in some time?


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-06-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
Bug 66142 depends on bug 63916, which changed state.

Bug 63916 Summary: value-numbering fails to forward variable addresses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63916

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-06-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #18 from Richard Biener rguenth at gcc dot gnu.org ---
Let's re-open this for the original testcase.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-06-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #17 from Richard Biener rguenth at gcc dot gnu.org ---
Ok, so even with PR63916 rudimentary fixed we hit the issue that in

  _9 = D.3665[_11].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
...
  _24 = MEM[(const struct Ray *)D.3665][_11].org.x;

the store to MEM[(struct vec_ *)_9 + 4B] aliases MEM[(const struct Ray
*)D.3665][_11].org.x.

And the rudimentary fix for PR63916 doesn't allow for the forwarding to succeed
into MEM[(struct vec_ *)_9 + 4B] either.

Without SRA we get

  D.3665[_11].org = p;
  D.3665[_11].dir.x = x_12;
  D.3665[_11].dir.y = y_13;
  D.3664[_11].t = 1.0e+10;
  D.3664[_11].h = 0;
  _25 = MEM[(const struct Ray *)D.3665][_11].org.x;

and the look-through-aggregate-copy code bails out at

  /* See if the assignment kills REF.  */
  base2 = ao_ref_base (lhs_ref);
  offset2 = lhs_ref.offset;
  size2 = lhs_ref.size;
  maxsize2 = lhs_ref.max_size;
  if (maxsize2 == -1
  || (base != base2
   (TREE_CODE (base) != MEM_REF
  || TREE_CODE (base2) != MEM_REF
  || TREE_OPERAND (base, 0) != TREE_OPERAND (base2, 0)
  || !tree_int_cst_equal (TREE_OPERAND (base, 1),
  TREE_OPERAND (base2, 1
  || offset2  offset
  || offset2 + size2  offset + maxsize)
return (void *)-1;

stmt_kills_ref_p fails to compare
MEM[(const struct Ray *)D.3665][_15].org and D.3665[_15].org
equal via operand_equal_p.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org ---
Author: jakub
Date: Fri May 29 13:06:23 2015
New Revision: 223863

URL: https://gcc.gnu.org/viewcvs?rev=223863root=gccview=rev
Log:
PR tree-optimization/66142
* tree-if-conv.c (if_convertible_phi_p): Don't give up on
virtual phis that feed themselves.

* gcc.dg/vect/pr66142.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/vect/pr66142.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-if-conv.c


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-28 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #15 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Thu May 28 13:24:53 2015
New Revision: 223816

URL: https://gcc.gnu.org/viewcvs?rev=223816root=gccview=rev
Log:
2015-05-28  Richard Biener  rguent...@suse.de

PR tree-optimization/66142
* tree-ssa-sccvn.c (vn_reference_lookup_3): Handle non-GIMPLE
values better in memcpy destination handling.  Handle non-aliasing
we discover here.

* gcc.dg/tree-ssa/ssa-fre-44.c: Fixup.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
trunk/gcc/tree-ssa-sccvn.c


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #14 from Richard Biener rguenth at gcc dot gnu.org ---
As said in comment #9, this is PR63916 I think.  We have

  _9 = D.3665[_11].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
...
  _24 = MEM[(const struct Ray *)D.3665][_11].org.x;
  _27 = MEM[(const struct Ray *)D.3665][_11].org.y;

which ultimately is from inlining of

  D.3665[D.3663].org = p;
  D.3665[D.3663].dir.x = x;
  D.3665[D.3663].dir.y = y;
  D.3664[D.3663].t = 1.0e+10;
  D.3664[D.3663].h = 0;
  D.3667 = D.3665[D.3663];
  D.3668 = D.3664[D.3663];
  bar (D.3668, D.3667, spheres[0]);

to

  p.x = 1.0e+0;  
  p.y = _9;
...
  D.3665[_15].org = p;
  D.3665[_15].dir.x = x_16;
  D.3665[_15].dir.y = y_17;
  D.3664[_15].t = 1.0e+10;
  D.3664[_15].h = 0;
  _23 = D.3665[_15];
  _24 = D.3664[_15];
  _32 = MEM[(const struct Ray *)_23].org.x;

forwprop cannot forward D.3665[_15] into the dereference (it would create
a bogus access paths even if types were matching and TBAA would be fine).
And SCCVN isn't able to canonicalize the accesses (because it doesn't
look at stmt defs when building the canonical address of an access).
Note that PRE uses the canonical form for expression insertion so we have
to be careful what we put in the canonical form (or separate the address
compute from the actual dereference).

Yes, lowering the access more in the SCCVN machinery would be possible
(say, use the affine combination machinery with the powers of
affine-comb-expand (and making that use the VN lattice...)), but the
issue with this is always PRE insertion and us _not_ wanting to have
all PRE inserted loads be fully lowered to address-compute plus indirection.

I hope to get to PR63916 this stage1.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-05-26
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #10 from Richard Biener rguenth at gcc dot gnu.org ---
Testing

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 223574)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1894,7 +1894,12 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   size2 = lhs_ref.size;
   maxsize2 = lhs_ref.max_size;
   if (maxsize2 == -1
- || (base != base2  !operand_equal_p (base, base2, 0))
+ || (base != base2
+  (TREE_CODE (base) != MEM_REF
+ || TREE_CODE (base2) != MEM_REF
+ || TREE_OPERAND (base, 0) != TREE_OPERAND (base2, 0)
+ || !tree_int_cst_equal (TREE_OPERAND (base, 1),
+ TREE_OPERAND (base2, 1
  || offset2  offset
  || offset2 + size2  offset + maxsize)
return (void *)-1;

which fixes the comment#7 testcase.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-26 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #9 from rguenther at suse dot de rguenther at suse dot de ---
On Mon, 25 May 2015, jakub at gcc dot gnu.org wrote:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
 
 Jakub Jelinek jakub at gcc dot gnu.org changed:
 
What|Removed |Added
 
  CC||rguenth at gcc dot gnu.org
 
 --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org --- I guess 
 it depends on what exactly SCCVN uses the operand vectors 
 created/optimized by copy_reference_ops_from_ref/valueize_refs_1 for. If 
 it is used only for tracking what is the value of the reference (the 
 value stored into that piece of memory), or also for canonicalization of 
 references to certain more canonical form, or both.  If only the value 

It's only for tracking the value (and to get a canonical form for
its internal hash-tables and compare function).  I think this bug
may be related to PR63916.

 of the reference, then supposedly it would be nice to get far more 
 canonicalization in the references (e.g. turn (perhaps multiple nested) 
 COMPONENT_REFs into MEM_REFs with offset (except perhaps for bitfields, 
 perhaps even do something about ARRAY_REFs, so that say even: struct A { 
 float x, y; }; struct B { struct A u; }; void bar (struct A *);
 
 float
 f2 (struct B *x, int y)
 {
   struct A p;
   p.x = 1.0f;
   p.y = 2.0f;
   char *z = (char *) x;
   z += y * sizeof (struct B);
   z += __builtin_offsetof (struct B, u.y);
   x[y].u = p;
   float f = *(float *) z;
   bar (p);
   return f;
 }
 is handled).  But, supposedly it would be undesirable to canonicalize all the
 COMPONENT_REFs to the MEM_REFs (at least early on, e.g. for
 __builtin_object_size (, 1) purposes, or aliasing etc.).
 Richi, any thoughts on this?

On the f1 testcase it probably fails to look through the aggregate
copy in vn_reference_lookup_3.  I'll have a short look later.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #13 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Original test-case is not vectorized yet with Richard patch for sccvn.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Richard Biener rguenth at gcc dot gnu.org ---
Fixed(?).


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #11 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Tue May 26 13:55:40 2015
New Revision: 223697

URL: https://gcc.gnu.org/viewcvs?rev=223697root=gccview=rev
Log:
2015-05-26  Richard Biener  rguent...@suse.de

PR tree-optimization/66142
* tree-ssa-sccvn.c (vn_reference_lookup_3): Manually compare
MEM_REFs for the same base address.

* gcc.dg/tree-ssa/ssa-fre-44.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-sccvn.c


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org ---
Slightly improved variant of #c6 testcase:
struct A { float x, y; };
struct B { struct A u; };
void bar (struct A *);

float
f1 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = x[y].u;
  *q = p;
  float f = x[y].u.x + x[y].u.y;
  bar (p);
  return f;
}

float
f2 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  x[y].u = p;
  float f = x[y].u.x + x[y].u.y;
  bar (p);
  return f;
}

float
f3 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = x[y].u;
  __builtin_memcpy (q-x, p.x, sizeof (float));
  __builtin_memcpy (q-y, p.y, sizeof (float));
  *q = p;
  float f = x[y].u.x + x[y].u.y;
  bar (p);
  return f;
}

float
f4 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  __builtin_memcpy (x[y].u.x, p.x, sizeof (float));
  __builtin_memcpy (x[y].u.y, p.y, sizeof (float));
  float f = x[y].u.x + x[y].u.y;
  bar (p);
  return f;
}


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org ---
I guess it depends on what exactly SCCVN uses the operand vectors
created/optimized by copy_reference_ops_from_ref/valueize_refs_1 for.
If it is used only for tracking what is the value of the reference (the value
stored into that piece of memory), or also for canonicalization of references
to certain more canonical form, or both.  If only the value of the reference,
then
supposedly it would be nice to get far more canonicalization in the references
(e.g. turn (perhaps multiple nested) COMPONENT_REFs into MEM_REFs with offset
(except perhaps for bitfields, perhaps even do something about ARRAY_REFs, so
that say even:
struct A { float x, y; };
struct B { struct A u; };
void bar (struct A *);

float
f2 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  char *z = (char *) x;
  z += y * sizeof (struct B);
  z += __builtin_offsetof (struct B, u.y);
  x[y].u = p;
  float f = *(float *) z;
  bar (p);
  return f;
}
is handled).  But, supposedly it would be undesirable to canonicalize all the
COMPONENT_REFs to the MEM_REFs (at least early on, e.g. for
__builtin_object_size (, 1) purposes, or aliasing etc.).
Richi, any thoughts on this?


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org ---
Ah, but fre2 is able to optimize those:
  D.2881[_12].dir.x = x_13;
...
  _32 = MEM[(const struct Ray *)D.2881][_12].dir.x;
cases to
  _32 = x_13;
The problem in FRE2 is only with the MEM_REFs.
If in the testcase
-  ray.org = p;
+  ray.org.x = p.x;
+  ray.org.y = p.y;
then FRE2 handles it completely and the problem is the same between -fopenmp
and -fno-openmp - ifcvt not ifconverting the loop.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org ---
Ah, and the reason for the ifcvt failure is that we have an unoptimized
virtual PHI in the loop that ifcvt gives up on:
  # .MEM_55 = PHI .MEM_55(7), .MEM_5(D)(2)
Only phicprop2 pass is able to optimize that away (and replace all uses
of .MEM_55 with .MEM_5(D)).


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 35596
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35596action=edit
gcc6-pr66142.patch

Fix for the ifcvt issue.  With this the modified testcase is vectorized.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
On this testcase, I believe it is most important to change FRE and SRA.
Without -fopenmp, we have before fre2:
  MEM[(struct ray_ *)ray] = 1.0e+0;
  MEM[(struct ray_ *)ray + 4B] = _8;
  ray.dir.x = x_10;
  ray.dir.y = y_11;
...
  _23 = ray.org.x;
  _26 = ray.org.y;
  _29 = ray.dir.x;
  _31 = ray.dir.y;
and fre2 optimizes this into the values stored in there.  With -fopenmp, we
have the SIMD_LANE based array references:
  _12 = GOMP_SIMD_LANE (simduid.0_11(D));
  _9 = D.2881[_12].org;
  MEM[(struct vec_ *)_9] = 1.0e+0;
  MEM[(struct vec_ *)_9 + 4B] = _8;
  D.2881[_12].dir.x = x_13;
  D.2881[_12].dir.y = y_14;
...
  _26 = MEM[(const struct Ray *)D.2881][_12].org.x;
  _29 = MEM[(const struct Ray *)D.2881][_12].org.y;
  _32 = MEM[(const struct Ray *)D.2881][_12].dir.x;
  _34 = MEM[(const struct Ray *)D.2881][_12].dir.y;
that fre2 isn't able to optimize.  Doing something here on the vectorizer side
is too late, we really need to scalarize those or remove completely (through
fre2).


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org ---
As for the FRE (actually SCCVN) issue, seems this affects more than
GOMP_SIMD_LANE, seems there is insufficient canonicalization of the references
performed by SCCVN, so the same references, but in slightly different form,
aren't considered the same.
Please see e.g.
struct A { float x, y; };
struct B { struct A u, v; };
void bar (struct A *);

float
f1 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = x[y].u;
  *q = p;
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (p);
  return f;
}

float
f2 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  x[y].u = p;
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (p);
  return f;
}

float
f3 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  struct A *q = x[y].u;
  __builtin_memcpy (q, p.x, sizeof (float));
  __builtin_memcpy (q-y, p.y, sizeof (float));
  *q = p;
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (p);
  return f;
}

float
f4 (struct B *x, int y)
{
  struct A p;
  p.x = 1.0f;
  p.y = 2.0f;
  __builtin_memcpy (x[y].u.x, p.x, sizeof (float));
  __builtin_memcpy (x[y].u.y, p.y, sizeof (float));
  x[y].v.x = 3.0f;
  x[y].v.y = 4.0f;
  float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y;
  bar (p);
  return f;
}

at -O2, FRE is able to optimize f2 and f4, but not f1 and f3, despite all 4
functions doing the same thing.


[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142

--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35541
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35541action=edit
test-case to reproduce

Must be compiled with -Ofast -fopenmp -march=core-avx2 options.