[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

rguenther at suse dot de Mon, 17 Nov 2014 01:04:00 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984


--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 14 Nov 2014, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|ASSIGNED                    |NEW
>                  CC|                            |jamborm at gcc dot gnu.org,
>                    |                            |rguenth at gcc dot gnu.org
>            Assignee|jakub at gcc dot gnu.org           |unassigned at gcc dot 
> gnu.org
> 
> --- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Stupachenko Evgeny from comment #12)
> > Created attachment 33963 [details]
> > test case where pragma simd disable vectorization
> > 
> > The following test case compiled with "-Ofast" vectorize the loop in the
> > GetXsum function.
> > Adding "-fopenmp" leads to failed vectorization due to:
> > 
> > simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed
> > D.2329[_7].x = _12;
> > 
> > It looks like before the patch in this Bug loop was vectorized with 
> > -fopenmp.
> 
> The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop
> has invalid inter-iteration dependencies.
> 
> That said, even with that, with C it vectorizes fine, while with C++ it
> doesn't.
> 
> In *.einline the C -> C++ difference is (before that I don't see such):
> -  D.1856[_19].x = _24;
> -  _26 = &D.1856[_19];
> -  _27 = MEM[(const struct XY *)_26].x;
> +  D.2352[_19].x = _24;
> +  _26 = &D.2352[_19];
> +  _40 = MEM[(float *)_26];
> 
> In *.ealias the C -> C++ difference is:
> -  D.1856[_19].x = _24;
> -  _27 = MEM[(const struct XY *)&D.1856][_19].x;
> +  D.2352[_19].x = _24;
> +  _26 = &D.2352[_19];
> +  _40 = MEM[(float *)_26];
> 
> and apparently FRE1 handles the former but not the latter.  Richard?
> As the struct contains float at that offset, I don't see why FRE1 shouldn't
> optimize that to _40 = _24.
> 
> Shorter testcase for the FRE1 missed-optimization:
> struct S { float a, b; };
> 
> float
> foo (int x, float y)
> {
>   struct S z[1024];
>   z[x].a = y;
>   struct S *p = &z[x];
>   float *q = (float *) p;
>   return *q;
> }

I will have a look - it's designed to handle that fine.

> (dunno why the inliner handles things differently between C and C++ on the 
> #c12
> testcase).  Now, as for vectorizing it even if FRE isn't able to optimize it,
> we currently don't support interleaved accesses to the "omp simd array"
> attributed arrays, perhaps we could at least some easy cases thereof, and
> supposedly we should teach SRA about those too (like, if the arrays aren't
> addressable and aren't accesses as whole, but just individual fields, split it
> into separate "omp simd array" accesses instead.  In this particular case due
> to the FRE missed optimization it is addressable though.
> Or perhaps teach fold to gimple folding to fold that:
>   q_5 = &z[x_2(D)];
>   _6 = *q_5;
> back into:
>   _6 = z[x_2(D)].x;
> ?

No, that's generally invalid (forwprop does that if types match
closely enough which appearantly they don't?)

[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

Reply via email to