https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117557
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-reduction |
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Testcase:
#include <stdint.h>
#include <string.h>
#define N 8
#define L 8
void f(const uint8_t * restrict seq1,
const uint8_t *idx, uint8_t *seq_out) {
for (int i = 0; i < L; ++i) {
uint8_t h = idx[i];
memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
}
}
compiled at -O3 -mcpu=neoverse-n1+sve
miscompiles to:
vect_patt_26.9_89 = [vec_unpack_lo_expr] vect_patt_27.8_88;
vect_patt_26.9_90 = [vec_unpack_hi_expr] vect_patt_27.8_88;
vect_patt_25.10_94 = .MASK_GATHER_LOAD (_91, vect_patt_26.9_89, 1, { 0, ...
}, loop_mask_92, { 0, ... });
vect_patt_25.11_95 = .MASK_GATHER_LOAD (_91, vect_patt_26.9_90, 1, { 0, ...
}, loop_mask_93, { 0, ... });
.MASK_SCATTER_STORE (seq_out_15(D), { 0, 8, 16, ... }, 1, vect_patt_25.10_94,
loop_mask_92);
.MASK_SCATTER_STORE (seq_out_15(D), { 0, 8, 16, ... }, 1, vect_patt_25.11_95,
loop_mask_92);
rather than
vect_patt_26.9_90 = [vec_unpack_lo_expr] vect_patt_27.8_89;
vect_patt_26.9_91 = [vec_unpack_hi_expr] vect_patt_27.8_89;
vect_patt_25.10_95 = .MASK_GATHER_LOAD (_92, vect_patt_26.9_90, 1, { 0, ...
}, loop_mask_93);
vect_patt_25.11_96 = .MASK_GATHER_LOAD (_92, vect_patt_26.9_91, 1, { 0, ...
}, loop_mask_94);
.MASK_SCATTER_STORE (seq_out_15(D), { 0, 8, 16, ... }, 1, vect_patt_25.10_95,
loop_mask_93);
vectp_seq_out.12_100 = seq_out_15(D) + POLY_INT_CST [32, 32];
.MASK_SCATTER_STORE (vectp_seq_out.12_100, { 0, 8, 16, ... }, 1,
vect_patt_25.11_96, loop_mask_94);
This happens because the index passed to vect_get_loop_mask is wrong for SLP as
Richi suspected and dataref_ptr is wrong because it's being treated as a
constant inside the vec_num loop. i.e. it thinks for SLP every store is to the
same location.
The bump_vector_ptr call needs to be inside the inner loop as well or the inner
loop flattened into the outer one which then iterates over ncopies * vec_num.
Testing a patch. So mine.