https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117709
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think it also shows that the openMP SIMD handling for the SIMD vars lacks
optimization:
int D.2004[64];
int D.2003[64];
int D.2001[64];
...
those were supposed to be vector registers in the end, but we end up
with
__builtin_memset (&D.2001, 0, 256); // loop distribution
.MASK_STORE (&D.2004, 32B, { -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4 });
...
vect__26.37_78 = .MASK_LOAD (&D.2001, 32B, { -1, -1, -1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0 }, _52(D));
etc.
In particular a .MASK_LOAD with a UNDEF else value could be turned into a
non-mask load when it cannot trap. But in general for loop or len-masking
of OpenMP SIMD loops we may want to special case handling of those lowering
introduced arrays.
In the GIMPLE I see in .optimized:
vect__23.34_51 = .MASK_GATHER_LOAD (&MEM <int[11][101]> [(void *)&k + -88B],
{ 0, -15, -30, -45, -60, -75, -90, -105, -120, -135, -150, -165, -180, -195,
-210, -225, -240, -255, -270, -285, -300, -315, -330, -345, -360, -375, -390,
-405, -420, -435, -450, -465, -480, -495, -510, -525, -540, -555, -570, -585,
-600, -615, -630, -645, -660, -675, -690, -705, -720, -735, -750, -765, -780,
-795, -810, -825, -840, -855, -870, -885, -900, -915, -930, -945 }, 4, { 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
_53(D));
that looks odd, we load from before 'k' here. The DR says
#(Data Ref:
# bb: 7
# stmt: _23 = k[0][_22];
# ref: k[0][_22];
# base_object: k;
# Access function 0: {41, +, -15}_4
# Access function 1: 0
which looks correct to me, but the initial value we choose is odd. We do
if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
slp_node, &gs_info, &dataref_ptr,
&vec_offsets);
else
dataref_ptr
= vect_create_data_ref_ptr (vinfo, first_stmt_info,
aggr_type,
at_loop, offset, &dummy, gsi,
&ptr_incr, false, bump);
and pass offset == -252
This offset is I think initialized from get_negative_load_store_type but
not reset when we divert to VMAT_GATHER_SCATTER.
This is all in need of serious TLC ..
Testing a patch (I'll attach in a moment).