On Fri, 8 May 2026 at 00:49, Richard Henderson
<[email protected]> wrote:
>
> Signed-off-by: Richard Henderson <[email protected]>
> ---
>  target/arm/tcg/helper-defs.h   |  5 ++++
>  target/arm/tcg/translate-a64.c | 38 +++++++++++++++++++++++++
>  target/arm/tcg/vec_helper.c    | 52 ++++++++++++++++++++++++++++++++++
>  target/arm/tcg/a64.decode      |  6 ++++
>  4 files changed, 101 insertions(+)

> diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
> index 3231bb2100..f0dc11bc8a 100644
> --- a/target/arm/tcg/vec_helper.c
> +++ b/target/arm/tcg/vec_helper.c
> @@ -3345,3 +3345,55 @@ DO_SME2_LUT(4,4,h, 2)
>  DO_SME2_LUT(4,4,s, 4)
>
>  #undef DO_SME2_LUT
> +
> +void HELPER(gvec_luti2_b)(void *vd, void *vn, void *vm, uint32_t desc)
> +{
> +    unsigned part = simd_data(desc);
> +    unsigned vl = simd_oprsz(desc);
> +    unsigned elements = vl / 8;

Isn't simd_oprsz() the size in bytes? The pseudocode for
these ASIMD insns calculates the element count using sizes
in bytes, with "elements = 128 / esize" (esize being 8 or 16),
but I think for us we want "vl" for the _b insns and "vl / 2"
for the "_h" ones.

> +    unsigned ibase = elements * part;
> +    ARMVectorReg scratch;
> +
> +    do_lut_b(&scratch, vm, vn, elements, ibase, 0, 2, 8, 1);
> +    memcpy(vd, &scratch, vl);
> +    clear_tail(vd, vl, simd_maxsz(desc));

> +void HELPER(gvec_luti4_h)(void *vd, void *vn, void *vm, uint32_t desc)
> +{
> +    unsigned part = simd_data(desc);
> +    unsigned vl = simd_oprsz(desc);
> +    unsigned elements = vl / 16;
> +    unsigned ibase = elements * part;
> +    ARMVectorReg scratch;
> +
> +    do_lut_h(&scratch, vm, vn, elements, ibase, 0, 2, 16, 1);

LUTI4 has 4-bit indexes, so we should be passing "4" as the
isize to do_lut_h(), not "2".

> +    memcpy(vd, &scratch, vl);
> +    clear_tail(vd, vl, simd_maxsz(desc));
> +}

thanks
-- PMM

Reply via email to