Tamar Christina <tamar.christ...@arm.com> writes:
> @@ -13936,8 +13937,65 @@ cost_plus:
>                            mode, MULT, 1, speed);
>            return true;
>          }
> +     break;
> +    case PARALLEL:
> +      /* Fall through */

Which code paths lead to getting a PARALLEL here?

> +    case CONST_VECTOR:
> +     {
> +       rtx gen_insn = aarch64_simd_make_constant (x, true);
> +       /* Not a valid const vector.  */
> +       if (!gen_insn)
> +         break;
>  
> -      /* Fall through.  */
> +       switch (GET_CODE (gen_insn))
> +       {
> +       case CONST_VECTOR:
> +         /* Load using MOVI/MVNI.  */
> +         if (aarch64_simd_valid_immediate (x, NULL))
> +           *cost += extra_cost->vect.movi;
> +         else /* Load using constant pool.  */
> +           *cost += extra_cost->ldst.load;
> +         break;
> +       /* Load using a DUP.  */
> +       case VEC_DUPLICATE:
> +         *cost += extra_cost->vect.dup;
> +         break;

Does this trigger in practice?  The new check==true path (rightly) stops
the duplicated element from being forced into a register, but then
I would have expected:

rtx
gen_vec_duplicate (machine_mode mode, rtx x)
{
  if (valid_for_const_vector_p (mode, x))
    return gen_const_vec_duplicate (mode, x);
  return gen_rtx_VEC_DUPLICATE (mode, x);
}

to generate the original CONST_VECTOR again.

> +       default:
> +         *cost += extra_cost->ldst.load;
> +         break;
> +       }
> +       return true;
> +     }
> +    case VEC_CONCAT:
> +     /* depending on the operation, either DUP or INS.
> +        For now, keep default costing.  */
> +     break;
> +    case VEC_DUPLICATE:
> +     *cost += extra_cost->vect.dup;
> +     return true;
> +    case VEC_SELECT:
> +     {
> +       /* cost subreg of 0 as free, otherwise as DUP */
> +       rtx op1 = XEXP (x, 1);
> +       int nelts;
> +       if ((op1 == const0_rtx && !BYTES_BIG_ENDIAN)
> +           || (BYTES_BIG_ENDIAN
> +               && GET_MODE_NUNITS (mode).is_constant(&nelts)
> +               && INTVAL (op1) == nelts - 1))
> +         ;
> +       else if (vec_series_lowpart_p (mode, GET_MODE (op1), op1))
> +         ;
> +       else if (vec_series_highpart_p (mode, GET_MODE (op1), op1))
> +       /* Selecting the high part is not technically free, but we lack
> +          enough information to decide that here.  For instance selecting
> +          the high-part of a vec_dup *is* free or to feed into any _high
> +          instruction.   Both of which we can't really tell.  That said
> +          have a better chance to optimize an dup vs multiple constants.  */
> +         ;

Not sure about this.  We already try to detect the latter case
(_high instructions) via aarch64_strip_extend_vec_half.  We might
be missing some cases, but that still feels like the right way
to go IMO.

Selecting the high part of a vec_dup should get folded into
another vec_dup.

The lowpart bits look OK, but which paths call this function
without first simplifying the select to a subreg?  The subreg
is now the canonical form (thanks to r12-2288).

Thanks,
Richard

Reply via email to