Hi Juzhe,

in general looks OK.

> +  /* We need to use precomputed mask for such situation and such mask
> +     can only be computed in compile-time known size modes.  */
> +  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len, 
> 256)
> +      && !vec_len.is_constant ())
> +    return false;
> +

We could make this a separate variable like:
bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE 
(GET_MODE_INNER (vmode)));

Also add a comment that the non-constant case is handled by
shuffle_decompress_patterns in case we have a HImode vector twice the
size that can hold our indices.

>    /* MASK = SELECTOR < NUNTIS ? 1 : 0.  */

Comment should go inside the if branch.

> +  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 
> 256))

> +    }
> +  else
> +    {
> +      /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
> +      directly to generate the selector mask, instead, we can only use
> +      precomputed mask.

I find that comment a bit misleading as it's not the vmsltu itself but
rather that the indices cannot be held.

> +
> +      E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
> +      don't have a QImode scalar register to hold larger than 255.  */

We also cannot hold that in a vector QImode register, and, since there
is no larger HI mode vector we cannot create a larger selector.

Also add this as comment:

As the mask is a simple {0, 1, ...} pattern and the length is known we can
store it in a scalar register and broadcast it to a mask register.

> +      gcc_assert (vec_len.is_constant ());
> +      int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
> +      machine_mode mode = get_vector_mode (QImode, size).require ();
> +      rtx tmp = gen_reg_rtx (mode);
> +      rvv_builder v (mode, 1, size);
> +      for (int i = 0; i < vec_len.to_constant () / 8; i++)
> +     {
> +       uint8_t value = 0;
> +       for (int j = 0; j < 8; j++)
> +         {
> +           int index = i * 8 + j;
> +           if (known_lt (d->perm[index], 256))
> +             value |= 1 << j;
> +         }

I would have hoped that a simple

          v.quick_push (gen_int_mode (0b01010101, QImode));

suffices but that will probably clash if there are more than
two npatterns.

Regards
 Robin

Reply via email to