On 08/17/2017 11:04 AM, Alex Bennée wrote: > + int32_t *rd = (int32_t *) d; > + int16_t *rn = (int16_t *) n; > + int16_t rm = (int16_t) m; > + int i; > + > + #pragma GCC ivdep > + for (i = 0; i < opr_elt; ++i) { > + rd[i] = rn[i + doff_elt] * rm; > + }
You need to run this loop backward to avoid clobbering data when rd == rn. I thought you'd put m into ADVSIMD_DATA. > > + if (is_q) { > + simd_info = deposit32(simd_info, > + ADVSIMD_DOFF_ELT_SHIFT, > ADVSIMD_DOFF_ELT_BITS, 4); > + } It'd probably be useful to have a macro to clean this up: #define PUT_SIMD_DATA(t, d) \ deposit32(0, ADVSIMD_ ## t ## _SHIFT, ADVSIMD_ ## t ## _BITS, (d)) simd_info |= PUT_SIMD_DATA(DOFF_ELT, 4) that said, folding DOFF into the pointer that gets passed in the first place seems a better solution to me. r~