On Fri, 8 May 2026 at 00:49, Richard Henderson
<[email protected]> wrote:
>
> Signed-off-by: Richard Henderson <[email protected]>
> ---
> target/arm/tcg/helper-defs.h | 5 ++++
> target/arm/tcg/translate-a64.c | 38 +++++++++++++++++++++++++
> target/arm/tcg/vec_helper.c | 52 ++++++++++++++++++++++++++++++++++
> target/arm/tcg/a64.decode | 6 ++++
> 4 files changed, 101 insertions(+)
> diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
> index 3231bb2100..f0dc11bc8a 100644
> --- a/target/arm/tcg/vec_helper.c
> +++ b/target/arm/tcg/vec_helper.c
> @@ -3345,3 +3345,55 @@ DO_SME2_LUT(4,4,h, 2)
> DO_SME2_LUT(4,4,s, 4)
>
> #undef DO_SME2_LUT
> +
> +void HELPER(gvec_luti2_b)(void *vd, void *vn, void *vm, uint32_t desc)
> +{
> + unsigned part = simd_data(desc);
> + unsigned vl = simd_oprsz(desc);
> + unsigned elements = vl / 8;
Isn't simd_oprsz() the size in bytes? The pseudocode for
these ASIMD insns calculates the element count using sizes
in bytes, with "elements = 128 / esize" (esize being 8 or 16),
but I think for us we want "vl" for the _b insns and "vl / 2"
for the "_h" ones.
> + unsigned ibase = elements * part;
> + ARMVectorReg scratch;
> +
> + do_lut_b(&scratch, vm, vn, elements, ibase, 0, 2, 8, 1);
> + memcpy(vd, &scratch, vl);
> + clear_tail(vd, vl, simd_maxsz(desc));
> +void HELPER(gvec_luti4_h)(void *vd, void *vn, void *vm, uint32_t desc)
> +{
> + unsigned part = simd_data(desc);
> + unsigned vl = simd_oprsz(desc);
> + unsigned elements = vl / 16;
> + unsigned ibase = elements * part;
> + ARMVectorReg scratch;
> +
> + do_lut_h(&scratch, vm, vn, elements, ibase, 0, 2, 16, 1);
LUTI4 has 4-bit indexes, so we should be passing "4" as the
isize to do_lut_h(), not "2".
> + memcpy(vd, &scratch, vl);
> + clear_tail(vd, vl, simd_maxsz(desc));
> +}
thanks
-- PMM