[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Bug 66142 depends on bug 66379, which changed state. Bug 66379 Summary: SCCVN doesn't handle aggregate array element accesses very well https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66379 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #27 from Yuri Rumyantsev --- Created attachment 37940 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37940=edit test-case to reproduce Need to be compiled with -Ofast -mavx2 -fopenmp options.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #26 from Yuri Rumyantsev --- If we convert copy structures to copy structure fields test will be vectorized and all mentions of GOMP_SIMD_LANE will be deleted. But if we slightly modify test by introducing new function vdot and insert its call: b = r.x * ray->dir.x + r.y * ray->dir.y; | v b = vdot (r, ray->dir); test won't be vectorized: test2.cpp:70:9: note: not vectorized: not suitable for scatter store D.6062[_9].org.x = 1.0e+0; test2.cpp is attached.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #23 from Yuri Rumyantsev --- Richard, Do we have any chance to vectorize attached test-case using GCC6 compiler?
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #24 from Yuri Rumyantsev --- Richard, Do we have any chance to vectorize attached test-case using GCC6 compiler?
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #25 from rguenther at suse dot de --- On Tue, 8 Dec 2015, ysrumyan at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 > > --- Comment #23 from Yuri Rumyantsev --- > Richard, > > Do we have any chance to vectorize attached test-case using GCC6 compiler? No, I don't see any.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #22 from Richard Biener --- Author: rguenth Date: Fri Sep 18 07:57:00 2015 New Revision: 227896 URL: https://gcc.gnu.org/viewcvs?rev=227896=gcc=rev Log: 2015-09-18 Richard BienerPR tree-optimization/66142 * fold-const.c (operand_equal_p): When OEP_ADDRESS_OF treat MEM[] and x the same. * tree-ssa-sccvn.h (vn_reference_fold_indirect): Remove. * tree-ssa-sccvn.c (vn_reference_fold_indirect): Return true when we simplified sth. (vn_reference_maybe_forwprop_address): Likewise. (valueize_refs_1): When we simplified through vn_reference_fold_indirect or vn_reference_maybe_forwprop_address set valueized_anything to true. (vn_reference_lookup_3): Use stmt_kills_ref_p to see whether one ref kills the other instead of just a offset-based test. * tree-ssa-alias.c (stmt_kills_ref_p): Use OEP_ADDRESS_OF for the operand_equal_p test to compare bases and also compare sizes. Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/tree-ssa-alias.c trunk/gcc/tree-ssa-sccvn.c trunk/gcc/tree-ssa-sccvn.h
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #21 from Richard Biener --- Ok, so if we fix SCCVN further we hit its (designed) limitation with stmt walking not following backedges (but the use is in a loop and the def outside). We avoid doing the work to determine if sth is loop invariant here. This was trying to improve things with SRA disabled - I'm going to still push those changes if testing works out fine. Now with SRA enabled we have a better situation in this regard but face _9 = [_12].org; MEM[(struct vec_ *)_9] = 1.0e+0; MEM[(struct vec_ *)_9 + 4B] = _8; D.4309[_12].dir.x = x_13; D.4309[_12].dir.y = y_14; D.4308[_12].t = 1.0e+10; D.4308[_12].h = 0; _20 = [_12]; _26 = MEM[(const struct Ray *)][_12].org.x; which as said isn't friendly enough for the simplistic address forward propagation we have. The issue here is that we can't disambiguate against MEM[(struct vec_ *)_9 + 4B] = _8; when looking up MEM[(const struct Ray *)][_12].org.x; because vn_reference_maybe_forwprop_address doesn't forward into MEM[(struct vec_ *)_9 + 4B] as its offset is not zero: /* If that didn't work because the address isn't invariant propagate the reference tree from the address operation in case the current dereference isn't offsetted. */ if (!addr_base && *i_p == ops->length () - 1 && off == 0 /* This makes us disable this transform for PRE where the reference ops might be also used for code insertion which is invalid. */ && default_vn_walk_kind == VN_WALKREWRITE) { auto_vectem; copy_reference_ops_from_ref (TREE_OPERAND (addr, 0), ); ops->pop (); ops->pop (); ops->safe_splice (tem); --*i_p; return true; it also likely wouldn't help vn_reference_lookup_3 to do the disambiguation as that performs just lhs_ops = valueize_refs_1 (lhs_ops, _anything); if (valueized_anything) { lhs_ref_ok = ao_ref_init_from_vn_reference (_ref, get_alias_set (lhs), TREE_TYPE (lhs), lhs_ops); if (lhs_ref_ok && !refs_may_alias_p_1 (ref, _ref, true)) return NULL; but there is still the variable offset to cope with. We'd need to perform some disambiguation on lhs_ref vs. vr. But it's not clear how to represent [_12].org forwarded into MEM[(struct vec_ *)_9 + 4B]. It's kind-of a COMPONENT_REF with just a offset added. Let's collect the improvements I have sofar in my local tree and see if they otherwise work.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #20 from Richard Biener --- Yes, I plan to come back to this during stage3.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Igor Zamyatin changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #19 from Igor Zamyatin --- Richard, would you be able to look at this in some time?
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Bug 66142 depends on bug 63916, which changed state. Bug 63916 Summary: value-numbering fails to forward variable addresses https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63916 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #18 from Richard Biener rguenth at gcc dot gnu.org --- Let's re-open this for the original testcase.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #17 from Richard Biener rguenth at gcc dot gnu.org --- Ok, so even with PR63916 rudimentary fixed we hit the issue that in _9 = D.3665[_11].org; MEM[(struct vec_ *)_9] = 1.0e+0; MEM[(struct vec_ *)_9 + 4B] = _8; ... _24 = MEM[(const struct Ray *)D.3665][_11].org.x; the store to MEM[(struct vec_ *)_9 + 4B] aliases MEM[(const struct Ray *)D.3665][_11].org.x. And the rudimentary fix for PR63916 doesn't allow for the forwarding to succeed into MEM[(struct vec_ *)_9 + 4B] either. Without SRA we get D.3665[_11].org = p; D.3665[_11].dir.x = x_12; D.3665[_11].dir.y = y_13; D.3664[_11].t = 1.0e+10; D.3664[_11].h = 0; _25 = MEM[(const struct Ray *)D.3665][_11].org.x; and the look-through-aggregate-copy code bails out at /* See if the assignment kills REF. */ base2 = ao_ref_base (lhs_ref); offset2 = lhs_ref.offset; size2 = lhs_ref.size; maxsize2 = lhs_ref.max_size; if (maxsize2 == -1 || (base != base2 (TREE_CODE (base) != MEM_REF || TREE_CODE (base2) != MEM_REF || TREE_OPERAND (base, 0) != TREE_OPERAND (base2, 0) || !tree_int_cst_equal (TREE_OPERAND (base, 1), TREE_OPERAND (base2, 1 || offset2 offset || offset2 + size2 offset + maxsize) return (void *)-1; stmt_kills_ref_p fails to compare MEM[(const struct Ray *)D.3665][_15].org and D.3665[_15].org equal via operand_equal_p.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: jakub Date: Fri May 29 13:06:23 2015 New Revision: 223863 URL: https://gcc.gnu.org/viewcvs?rev=223863root=gccview=rev Log: PR tree-optimization/66142 * tree-if-conv.c (if_convertible_phi_p): Don't give up on virtual phis that feed themselves. * gcc.dg/vect/pr66142.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr66142.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-if-conv.c
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #15 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Thu May 28 13:24:53 2015 New Revision: 223816 URL: https://gcc.gnu.org/viewcvs?rev=223816root=gccview=rev Log: 2015-05-28 Richard Biener rguent...@suse.de PR tree-optimization/66142 * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle non-GIMPLE values better in memcpy destination handling. Handle non-aliasing we discover here. * gcc.dg/tree-ssa/ssa-fre-44.c: Fixup. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c trunk/gcc/tree-ssa-sccvn.c
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #14 from Richard Biener rguenth at gcc dot gnu.org --- As said in comment #9, this is PR63916 I think. We have _9 = D.3665[_11].org; MEM[(struct vec_ *)_9] = 1.0e+0; MEM[(struct vec_ *)_9 + 4B] = _8; ... _24 = MEM[(const struct Ray *)D.3665][_11].org.x; _27 = MEM[(const struct Ray *)D.3665][_11].org.y; which ultimately is from inlining of D.3665[D.3663].org = p; D.3665[D.3663].dir.x = x; D.3665[D.3663].dir.y = y; D.3664[D.3663].t = 1.0e+10; D.3664[D.3663].h = 0; D.3667 = D.3665[D.3663]; D.3668 = D.3664[D.3663]; bar (D.3668, D.3667, spheres[0]); to p.x = 1.0e+0; p.y = _9; ... D.3665[_15].org = p; D.3665[_15].dir.x = x_16; D.3665[_15].dir.y = y_17; D.3664[_15].t = 1.0e+10; D.3664[_15].h = 0; _23 = D.3665[_15]; _24 = D.3664[_15]; _32 = MEM[(const struct Ray *)_23].org.x; forwprop cannot forward D.3665[_15] into the dereference (it would create a bogus access paths even if types were matching and TBAA would be fine). And SCCVN isn't able to canonicalize the accesses (because it doesn't look at stmt defs when building the canonical address of an access). Note that PRE uses the canonical form for expression insertion so we have to be careful what we put in the canonical form (or separate the address compute from the actual dereference). Yes, lowering the access more in the SCCVN machinery would be possible (say, use the affine combination machinery with the powers of affine-comb-expand (and making that use the VN lattice...)), but the issue with this is always PRE insertion and us _not_ wanting to have all PRE inserted loads be fully lowered to address-compute plus indirection. I hope to get to PR63916 this stage1.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2015-05-26 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #10 from Richard Biener rguenth at gcc dot gnu.org --- Testing Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 223574) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -1894,7 +1894,12 @@ vn_reference_lookup_3 (ao_ref *ref, tree size2 = lhs_ref.size; maxsize2 = lhs_ref.max_size; if (maxsize2 == -1 - || (base != base2 !operand_equal_p (base, base2, 0)) + || (base != base2 + (TREE_CODE (base) != MEM_REF + || TREE_CODE (base2) != MEM_REF + || TREE_OPERAND (base, 0) != TREE_OPERAND (base2, 0) + || !tree_int_cst_equal (TREE_OPERAND (base, 1), + TREE_OPERAND (base2, 1 || offset2 offset || offset2 + size2 offset + maxsize) return (void *)-1; which fixes the comment#7 testcase.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #9 from rguenther at suse dot de rguenther at suse dot de --- On Mon, 25 May 2015, jakub at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org --- I guess it depends on what exactly SCCVN uses the operand vectors created/optimized by copy_reference_ops_from_ref/valueize_refs_1 for. If it is used only for tracking what is the value of the reference (the value stored into that piece of memory), or also for canonicalization of references to certain more canonical form, or both. If only the value It's only for tracking the value (and to get a canonical form for its internal hash-tables and compare function). I think this bug may be related to PR63916. of the reference, then supposedly it would be nice to get far more canonicalization in the references (e.g. turn (perhaps multiple nested) COMPONENT_REFs into MEM_REFs with offset (except perhaps for bitfields, perhaps even do something about ARRAY_REFs, so that say even: struct A { float x, y; }; struct B { struct A u; }; void bar (struct A *); float f2 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; char *z = (char *) x; z += y * sizeof (struct B); z += __builtin_offsetof (struct B, u.y); x[y].u = p; float f = *(float *) z; bar (p); return f; } is handled). But, supposedly it would be undesirable to canonicalize all the COMPONENT_REFs to the MEM_REFs (at least early on, e.g. for __builtin_object_size (, 1) purposes, or aliasing etc.). Richi, any thoughts on this? On the f1 testcase it probably fails to look through the aggregate copy in vn_reference_lookup_3. I'll have a short look later.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #13 from Yuri Rumyantsev ysrumyan at gmail dot com --- Original test-case is not vectorized yet with Richard patch for sccvn.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #12 from Richard Biener rguenth at gcc dot gnu.org --- Fixed(?).
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #11 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Tue May 26 13:55:40 2015 New Revision: 223697 URL: https://gcc.gnu.org/viewcvs?rev=223697root=gccview=rev Log: 2015-05-26 Richard Biener rguent...@suse.de PR tree-optimization/66142 * tree-ssa-sccvn.c (vn_reference_lookup_3): Manually compare MEM_REFs for the same base address. * gcc.dg/tree-ssa/ssa-fre-44.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-sccvn.c
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org --- Slightly improved variant of #c6 testcase: struct A { float x, y; }; struct B { struct A u; }; void bar (struct A *); float f1 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; struct A *q = x[y].u; *q = p; float f = x[y].u.x + x[y].u.y; bar (p); return f; } float f2 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; x[y].u = p; float f = x[y].u.x + x[y].u.y; bar (p); return f; } float f3 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; struct A *q = x[y].u; __builtin_memcpy (q-x, p.x, sizeof (float)); __builtin_memcpy (q-y, p.y, sizeof (float)); *q = p; float f = x[y].u.x + x[y].u.y; bar (p); return f; } float f4 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; __builtin_memcpy (x[y].u.x, p.x, sizeof (float)); __builtin_memcpy (x[y].u.y, p.y, sizeof (float)); float f = x[y].u.x + x[y].u.y; bar (p); return f; }
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org --- I guess it depends on what exactly SCCVN uses the operand vectors created/optimized by copy_reference_ops_from_ref/valueize_refs_1 for. If it is used only for tracking what is the value of the reference (the value stored into that piece of memory), or also for canonicalization of references to certain more canonical form, or both. If only the value of the reference, then supposedly it would be nice to get far more canonicalization in the references (e.g. turn (perhaps multiple nested) COMPONENT_REFs into MEM_REFs with offset (except perhaps for bitfields, perhaps even do something about ARRAY_REFs, so that say even: struct A { float x, y; }; struct B { struct A u; }; void bar (struct A *); float f2 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; char *z = (char *) x; z += y * sizeof (struct B); z += __builtin_offsetof (struct B, u.y); x[y].u = p; float f = *(float *) z; bar (p); return f; } is handled). But, supposedly it would be undesirable to canonicalize all the COMPONENT_REFs to the MEM_REFs (at least early on, e.g. for __builtin_object_size (, 1) purposes, or aliasing etc.). Richi, any thoughts on this?
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org --- Ah, but fre2 is able to optimize those: D.2881[_12].dir.x = x_13; ... _32 = MEM[(const struct Ray *)D.2881][_12].dir.x; cases to _32 = x_13; The problem in FRE2 is only with the MEM_REFs. If in the testcase - ray.org = p; + ray.org.x = p.x; + ray.org.y = p.y; then FRE2 handles it completely and the problem is the same between -fopenmp and -fno-openmp - ifcvt not ifconverting the loop.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- Ah, and the reason for the ifcvt failure is that we have an unoptimized virtual PHI in the loop that ifcvt gives up on: # .MEM_55 = PHI .MEM_55(7), .MEM_5(D)(2) Only phicprop2 pass is able to optimize that away (and replace all uses of .MEM_55 with .MEM_5(D)).
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 35596 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35596action=edit gcc6-pr66142.patch Fix for the ifcvt issue. With this the modified testcase is vectorized.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- On this testcase, I believe it is most important to change FRE and SRA. Without -fopenmp, we have before fre2: MEM[(struct ray_ *)ray] = 1.0e+0; MEM[(struct ray_ *)ray + 4B] = _8; ray.dir.x = x_10; ray.dir.y = y_11; ... _23 = ray.org.x; _26 = ray.org.y; _29 = ray.dir.x; _31 = ray.dir.y; and fre2 optimizes this into the values stored in there. With -fopenmp, we have the SIMD_LANE based array references: _12 = GOMP_SIMD_LANE (simduid.0_11(D)); _9 = D.2881[_12].org; MEM[(struct vec_ *)_9] = 1.0e+0; MEM[(struct vec_ *)_9 + 4B] = _8; D.2881[_12].dir.x = x_13; D.2881[_12].dir.y = y_14; ... _26 = MEM[(const struct Ray *)D.2881][_12].org.x; _29 = MEM[(const struct Ray *)D.2881][_12].org.y; _32 = MEM[(const struct Ray *)D.2881][_12].dir.x; _34 = MEM[(const struct Ray *)D.2881][_12].dir.y; that fre2 isn't able to optimize. Doing something here on the vectorizer side is too late, we really need to scalarize those or remove completely (through fre2).
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org --- As for the FRE (actually SCCVN) issue, seems this affects more than GOMP_SIMD_LANE, seems there is insufficient canonicalization of the references performed by SCCVN, so the same references, but in slightly different form, aren't considered the same. Please see e.g. struct A { float x, y; }; struct B { struct A u, v; }; void bar (struct A *); float f1 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; struct A *q = x[y].u; *q = p; x[y].v.x = 3.0f; x[y].v.y = 4.0f; float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y; bar (p); return f; } float f2 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; x[y].u = p; x[y].v.x = 3.0f; x[y].v.y = 4.0f; float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y; bar (p); return f; } float f3 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; struct A *q = x[y].u; __builtin_memcpy (q, p.x, sizeof (float)); __builtin_memcpy (q-y, p.y, sizeof (float)); *q = p; x[y].v.x = 3.0f; x[y].v.y = 4.0f; float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y; bar (p); return f; } float f4 (struct B *x, int y) { struct A p; p.x = 1.0f; p.y = 2.0f; __builtin_memcpy (x[y].u.x, p.x, sizeof (float)); __builtin_memcpy (x[y].u.y, p.y, sizeof (float)); x[y].v.x = 3.0f; x[y].v.y = 4.0f; float f = x[y].u.x + x[y].u.y + x[y].v.x + x[y].v.y; bar (p); return f; } at -O2, FRE is able to optimize f2 and f4, but not f1 and f3, despite all 4 functions doing the same thing.
[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35541 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35541action=edit test-case to reproduce Must be compiled with -Ofast -fopenmp -march=core-avx2 options.