[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Richard Biener changed: What|Removed |Added Target Milestone|4.9.4 |---
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Jakub Jelinek changed: What|Removed |Added Target Milestone|4.9.3 |4.9.4
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #16 from Jakub Jelinek --- GCC 4.9.3 has been released.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #15 from Richard Biener --- Ok, so SCCVN only can forward constant addresses (because variable ones wouldn't fit in the same refs vector without "inserting" ops). See vn_reference_maybe_forwprop_address. It's also required so that we can re-materialize the expression in PRE for example and not wind up with invalid access paths there (similar to the reason why forwprop doesn't do this transform). I think it might be possible to teach SCCVN to do this, but it might be not trivial.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #14 from rguenther at suse dot de --- On Fri, 14 Nov 2014, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 > > Jakub Jelinek changed: > >What|Removed |Added > > Status|ASSIGNED|NEW > CC||jamborm at gcc dot gnu.org, >||rguenth at gcc dot gnu.org >Assignee|jakub at gcc dot gnu.org |unassigned at gcc dot > gnu.org > > --- Comment #13 from Jakub Jelinek --- > (In reply to Stupachenko Evgeny from comment #12) > > Created attachment 33963 [details] > > test case where pragma simd disable vectorization > > > > The following test case compiled with "-Ofast" vectorize the loop in the > > GetXsum function. > > Adding "-fopenmp" leads to failed vectorization due to: > > > > simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed > > D.2329[_7].x = _12; > > > > It looks like before the patch in this Bug loop was vectorized with > > -fopenmp. > > The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop > has invalid inter-iteration dependencies. > > That said, even with that, with C it vectorizes fine, while with C++ it > doesn't. > > In *.einline the C -> C++ difference is (before that I don't see such): > - D.1856[_19].x = _24; > - _26 = &D.1856[_19]; > - _27 = MEM[(const struct XY *)_26].x; > + D.2352[_19].x = _24; > + _26 = &D.2352[_19]; > + _40 = MEM[(float *)_26]; > > In *.ealias the C -> C++ difference is: > - D.1856[_19].x = _24; > - _27 = MEM[(const struct XY *)&D.1856][_19].x; > + D.2352[_19].x = _24; > + _26 = &D.2352[_19]; > + _40 = MEM[(float *)_26]; > > and apparently FRE1 handles the former but not the latter. Richard? > As the struct contains float at that offset, I don't see why FRE1 shouldn't > optimize that to _40 = _24. > > Shorter testcase for the FRE1 missed-optimization: > struct S { float a, b; }; > > float > foo (int x, float y) > { > struct S z[1024]; > z[x].a = y; > struct S *p = &z[x]; > float *q = (float *) p; > return *q; > } I will have a look - it's designed to handle that fine. > (dunno why the inliner handles things differently between C and C++ on the > #c12 > testcase). Now, as for vectorizing it even if FRE isn't able to optimize it, > we currently don't support interleaved accesses to the "omp simd array" > attributed arrays, perhaps we could at least some easy cases thereof, and > supposedly we should teach SRA about those too (like, if the arrays aren't > addressable and aren't accesses as whole, but just individual fields, split it > into separate "omp simd array" accesses instead. In this particular case due > to the FRE missed optimization it is addressable though. > Or perhaps teach fold to gimple folding to fold that: > q_5 = &z[x_2(D)]; > _6 = *q_5; > back into: > _6 = z[x_2(D)].x; > ? No, that's generally invalid (forwprop does that if types match closely enough which appearantly they don't?)
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|NEW CC||jamborm at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Assignee|jakub at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #13 from Jakub Jelinek --- (In reply to Stupachenko Evgeny from comment #12) > Created attachment 33963 [details] > test case where pragma simd disable vectorization > > The following test case compiled with "-Ofast" vectorize the loop in the > GetXsum function. > Adding "-fopenmp" leads to failed vectorization due to: > > simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed > D.2329[_7].x = _12; > > It looks like before the patch in this Bug loop was vectorized with -fopenmp. The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop has invalid inter-iteration dependencies. That said, even with that, with C it vectorizes fine, while with C++ it doesn't. In *.einline the C -> C++ difference is (before that I don't see such): - D.1856[_19].x = _24; - _26 = &D.1856[_19]; - _27 = MEM[(const struct XY *)_26].x; + D.2352[_19].x = _24; + _26 = &D.2352[_19]; + _40 = MEM[(float *)_26]; In *.ealias the C -> C++ difference is: - D.1856[_19].x = _24; - _27 = MEM[(const struct XY *)&D.1856][_19].x; + D.2352[_19].x = _24; + _26 = &D.2352[_19]; + _40 = MEM[(float *)_26]; and apparently FRE1 handles the former but not the latter. Richard? As the struct contains float at that offset, I don't see why FRE1 shouldn't optimize that to _40 = _24. Shorter testcase for the FRE1 missed-optimization: struct S { float a, b; }; float foo (int x, float y) { struct S z[1024]; z[x].a = y; struct S *p = &z[x]; float *q = (float *) p; return *q; } (dunno why the inliner handles things differently between C and C++ on the #c12 testcase). Now, as for vectorizing it even if FRE isn't able to optimize it, we currently don't support interleaved accesses to the "omp simd array" attributed arrays, perhaps we could at least some easy cases thereof, and supposedly we should teach SRA about those too (like, if the arrays aren't addressable and aren't accesses as whole, but just individual fields, split it into separate "omp simd array" accesses instead. In this particular case due to the FRE missed optimization it is addressable though. Or perhaps teach fold to gimple folding to fold that: q_5 = &z[x_2(D)]; _6 = *q_5; back into: _6 = z[x_2(D)].x; ?
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #12 from Stupachenko Evgeny --- Created attachment 33963 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33963&action=edit test case where pragma simd disable vectorization The following test case compiled with "-Ofast" vectorize the loop in the GetXsum function. Adding "-fopenmp" leads to failed vectorization due to: simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed D.2329[_7].x = _12; It looks like before the patch in this Bug loop was vectorized with -fopenmp.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Jakub Jelinek changed: What|Removed |Added Target Milestone|4.9.2 |4.9.3 --- Comment #11 from Jakub Jelinek --- GCC 4.9.2 has been released.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Jakub Jelinek changed: What|Removed |Added Target Milestone|4.9.1 |4.9.2 --- Comment #10 from Jakub Jelinek --- GCC 4.9.1 has been released.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Jakub Jelinek changed: What|Removed |Added Target Milestone|4.9.0 |4.9.1 --- Comment #9 from Jakub Jelinek --- GCC 4.9.0 has been released
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #8 from Jakub Jelinek --- The wrong-code issue is now fixed, but we still don't vectorize this. See http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00514.html for details. Note that without making the two pointer parameters linear on the elemental function it actually will not be successfully vectorized unless you have CPU that can do scatter (Skylake, anything else?). Keeping this PR open so that I don't forget to address this.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #7 from Jakub Jelinek --- Author: jakub Date: Sat Feb 8 09:10:14 2014 New Revision: 207629 URL: http://gcc.gnu.org/viewcvs?rev=207629&root=gcc&view=rev Log: PR c/59984 * gimplify.c (gimplify_bind_expr): In ORT_SIMD region mark local addressable non-static vars as GOVD_PRIVATE instead of GOVD_LOCAL. * omp-low.c (lower_omp_for): Move gimple_bind_vars and BLOCK_VARS of gimple_bind_block to new_stmt rather than copying them. * gcc.dg/vect/pr59984.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr59984.c Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c trunk/gcc/omp-low.c trunk/gcc/testsuite/ChangeLog
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #6 from Jakub Jelinek --- Created attachment 32077 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32077&action=edit Untested fix Untested fix. Likely I'll need to do something more, so that we don't emit extra decls for the debug info outside of the #pragma omp simd (flag on the private clause?) and add a testcase. And, the reason why with this patch (or the adjusted testcase with explicit private clause) isn't vectorized is that we'd need to teach vectorizer how to vectorize SSA_NAME = &array[SSA_NAME]; where the second SSA_NAME is GOMP_SIMD_LANE () result and array is an "omp simd array" array, as a vector of addresses (initialized before the loop), containing { &array[0], &array[1], ..., &array[vf-1] }.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #5 from Jakub Jelinek --- I guess during gimplification when gimplifying #pragma omp simd body we should add all the local variables (declared in the body) that aren't static and are address taken to private clause on the #pragma omp simd, so that they will be handled like if you in your testcase do instead: int v1, v2; #pragma omp simd private (v1, v2) for(i = 0; i < 1000; i++) { foo(a[i], &v1, &v2); a[i] = v1 * v2; } We don't vectorize that right now, but that is something that is likely fixable.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #4 from Igor Zamyatin --- vect details show that v1.0_14 = v1 and v2.1_15 = v2 are treated as invariants: test.c:24:14: note: -->vectorizing statement: v1.0_14 = v1; test.c:24:14: note: transform statement. test.c:24:14: note: transform load. ncopies = 1 test.c:24:14: note: create vector_type-pointer variable to type: vector(4) int vectorizing a pointer ref: v1 test.c:24:14: note: created vectp_v1.11_40 test.c:24:14: note: add new stmt: vect_v1.12_37 = MEM[(int *)vectp_v1.10_39]; test.c:24:14: note: hoisting out of the vectorized loop: v1.0_14 = v1; test.c:24:14: note: created new init_stmt: vect_cst_.13_7 = {v1.0_8, v1.0_8, v1.0_8, v1.0_8}; test.c:24:14: note: -->vectorizing statement: v2.1_15 = v2; test.c:24:14: note: transform statement. test.c:24:14: note: transform load. ncopies = 1 test.c:24:14: note: create vector_type-pointer variable to type: vector(4) int vectorizing a pointer ref: v2 test.c:24:14: note: created vectp_v2.15_1 test.c:24:14: note: add new stmt: vect_v2.16_60 = MEM[(int *)vectp_v2.14_58]; test.c:24:14: note: hoisting out of the vectorized loop: v2.1_15 = v2; test.c:24:14: note: created new init_stmt: vect_cst_.17_62 = {v2.1_61, v2.1_61, v2.1_61, v2.1_61}; Step for both loads determined as 0. Seems support for such case should be explicitly added
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Jakub Jelinek changed: What|Removed |Added Keywords||openmp CC||jakub at gcc dot gnu.org Target Milestone|--- |4.9.0
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #3 from Igor Zamyatin --- Vectorizer dump snippet for main: foo.simdclone.0 (vect__12.7_3, vect_cst_.8_53, vect_cst_.8_53, vect_cst_.9_51, vect_cst_.9_51); GIMPLE_NOP vect_v1.12_37 = MEM[(int *)vectp_v1.10_39]; (1) v1.0_14 = v1; vect_v2.16_60 = MEM[(int *)vectp_v2.14_58]; (2) v2.1_15 = v2; vect__16.18_63 = vect_cst_.13_7 * vect_cst_.17_62; <--- constants instead of _16 = v1.0_14 * v2.1_15; vect_v1.12_37 and MEM[(int *)vectp_a.19_65] = vect__16.18_63; vect_v2.16_60 Then DCE destroys (1) and (2) and later LIM hoists the multiplication away from the loop.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 Marek Polacek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-05 CC||mpolacek at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Marek Polacek --- Confirmed.
[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 --- Comment #1 from Stupachenko Evgeny --- Created attachment 31981 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31981&action=edit test case