On 4/13/20 4:42 PM, Stephen Long wrote: > +#define DO_ZPZZ_CHAR_MATCH(NAME, TYPE, H, EQUALS) > \ > +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) > \ > +{ > \ > + intptr_t i, opr_sz = simd_oprsz(desc); > \ > + for (i = 0; i < opr_sz; i += sizeof(TYPE)) { > \ > + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); > \ > + uint16_t *pd = (uint16_t *)(vd + H1_2(i >> 3)); > \ > + *pd = (*pd & ~1) | ((0 & EQUALS) | (1 & !EQUALS)); > \ > + if (pg & 1) { > \
The important error here is that the predicate is not always the low bit. When operating on bytes, every bit of the predicate is significant. When operating on halfwords, every even bit of the predicate is significant. In addition, when operating on halfwords, every odd bit of the result predicate must be zero. Which is why, generally, I have constructed the output predicate as we go. See, for instance, DO_CMP_PPZZ. > + TYPE nn = *(TYPE *)(vn + H(i)); > \ > + for (intptr_t j = 0; j < 16; j += sizeof(TYPE)) { > \ > + TYPE mm = *(TYPE *)(vm + H(i * 16 + j)); > \ mm needs to start at the beginning of the segment, which in this case is (i & -16). You don't need the elements of mm in any particular order (all of them are significant), so you can drop the use of H() here. Therefore the indexing for mm should be vm + (i & -16) + j. > + bool eq = nn == mm; > \ > + if ((eq && EQUALS) || (!eq && !EQUALS)) { > \ > + *pd = (*pd & ~1) | ((1 & EQUALS) | (0 & !EQUALS)); > \ > + } > \ It might be handy to split out the inner loop to a helper function, as, while the basic loop is ok, there are tricks that can improve it, so that we're comparing 8 bytes at a time. > +static bool do_sve2_zpzz_char_match(DisasContext *s, arg_rprr_esz *a, > + gen_helper_gvec_4 *fn) > +{ > + if (!dc_isar_feature(aa64_sve2, s)) { > + return false; > + } > + if (fn == NULL) { > + return false; > + } > + if (sve_access_check(s)) { > + unsigned vsz = vec_full_reg_size(s); > + unsigned psz = pred_full_reg_size(s); > + int dofs = pred_full_reg_offset(s, a->rd); > + int nofs = vec_full_reg_offset(s, a->rn); > + int mofs = vec_full_reg_offset(s, a->rm); > + int gofs = pred_full_reg_offset(s, a->pg); > + > + /* Save a copy if the destination overwrites the guarding predicate > */ > + int tofs = gofs; > + if (a->rd == a->pg) { > + tofs = offsetof(CPUARMState, vfp.preg_tmp); > + tcg_gen_gvec_mov(0, tofs, gofs, psz, psz); > + } > + > + tcg_gen_gvec_4_ool(dofs, nofs, mofs, gofs, vsz, vsz, 0, fn); > + do_predtest(s, dofs, tofs, psz / 8); You can avoid the copy and the predtest by using the iter_predtest_* functions and returning the flags result directly from the helper. Again, see DO_CMP_PPZZ. r~