On 6/30/23 08:58, Song Gao wrote:
+#define VEXTH(NAME, BIT, E1, E2) \ +void HELPER(NAME)(CPULoongArchState *env, \ + uint32_t oprsz, uint32_t vd, uint32_t vj) \ +{ \ + int i, max; \ + VReg *Vd = &(env->fpr[vd].vreg); \ + VReg *Vj = &(env->fpr[vj].vreg); \ + \ + max = LSX_LEN / BIT; \ + for (i = 0; i < max; i++) { \ + Vd->E1(i) = Vj->E2(i + max); \ + if (oprsz == 32) { \ + Vd->E1(i + max) = Vj->E2(i + max * 3); \ + } \ + } \ }
Better with void * and uint32_t desc. So this doesn't expand all in order, similar to x86 AVX and arm SVE. I believe the way I handled it there was ofs = 128 / bit; for (i = 0; i < oprsz / (BIT / 8); i += ofs) { for (j = 0; j < ofs; j++) { E1[i + j] = E2[i + j + ofs]; } } r~