https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984
--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 14 Nov 2014, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|ASSIGNED |NEW > CC| |jamborm at gcc dot gnu.org, > | |rguenth at gcc dot gnu.org > Assignee|jakub at gcc dot gnu.org |unassigned at gcc dot > gnu.org > > --- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > (In reply to Stupachenko Evgeny from comment #12) > > Created attachment 33963 [details] > > test case where pragma simd disable vectorization > > > > The following test case compiled with "-Ofast" vectorize the loop in the > > GetXsum function. > > Adding "-fopenmp" leads to failed vectorization due to: > > > > simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed > > D.2329[_7].x = _12; > > > > It looks like before the patch in this Bug loop was vectorized with > > -fopenmp. > > The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop > has invalid inter-iteration dependencies. > > That said, even with that, with C it vectorizes fine, while with C++ it > doesn't. > > In *.einline the C -> C++ difference is (before that I don't see such): > - D.1856[_19].x = _24; > - _26 = &D.1856[_19]; > - _27 = MEM[(const struct XY *)_26].x; > + D.2352[_19].x = _24; > + _26 = &D.2352[_19]; > + _40 = MEM[(float *)_26]; > > In *.ealias the C -> C++ difference is: > - D.1856[_19].x = _24; > - _27 = MEM[(const struct XY *)&D.1856][_19].x; > + D.2352[_19].x = _24; > + _26 = &D.2352[_19]; > + _40 = MEM[(float *)_26]; > > and apparently FRE1 handles the former but not the latter. Richard? > As the struct contains float at that offset, I don't see why FRE1 shouldn't > optimize that to _40 = _24. > > Shorter testcase for the FRE1 missed-optimization: > struct S { float a, b; }; > > float > foo (int x, float y) > { > struct S z[1024]; > z[x].a = y; > struct S *p = &z[x]; > float *q = (float *) p; > return *q; > } I will have a look - it's designed to handle that fine. > (dunno why the inliner handles things differently between C and C++ on the > #c12 > testcase). Now, as for vectorizing it even if FRE isn't able to optimize it, > we currently don't support interleaved accesses to the "omp simd array" > attributed arrays, perhaps we could at least some easy cases thereof, and > supposedly we should teach SRA about those too (like, if the arrays aren't > addressable and aren't accesses as whole, but just individual fields, split it > into separate "omp simd array" accesses instead. In this particular case due > to the FRE missed optimization it is addressable though. > Or perhaps teach fold to gimple folding to fold that: > q_5 = &z[x_2(D)]; > _6 = *q_5; > back into: > _6 = z[x_2(D)].x; > ? No, that's generally invalid (forwprop does that if types match closely enough which appearantly they don't?)