Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> Hi Richard,
> While digging thru aarch64_expand_vector_init, I noticed it gives
> priority to loading a constant first:
>  /* Initialise a vector which is part-variable.  We want to first try
>      to build those lanes which are constant in the most efficient way we
>      can.  */
>
> which results in suboptimal code-gen for following case:
> int16x8_t f_s16(int16_t x)
> {
>   return (int16x8_t) { x, x, x, x, x, x, x, 1 };
> }
>
> code-gen trunk:
> f_s16:
>         movi    v0.8h, 0x1
>         ins     v0.h[0], w0
>         ins     v0.h[1], w0
>         ins     v0.h[2], w0
>         ins     v0.h[3], w0
>         ins     v0.h[4], w0
>         ins     v0.h[5], w0
>         ins     v0.h[6], w0
>         ret
>
> The attached patch tweaks the following condition:
> if (n_var == n_elts && n_elts <= 16)
>   {
>     ...
>   }
>
> to pass if maxv >= 80% of n_elts, with 80% being an
> arbitrary "high enough" threshold. The intent is to dup
> the most repeating variable if it it's repetition
> is "high enough" and insert constants which should be "better" than
> loading constant first and inserting variables like in the above case.

I'm not too keen on the 80%.  Like you say, it seems a bit arbitrary.

The case above can also be handled by relaxing n_var == n_elts to
n_var >= n_elts - 1, so that if there's just one constant element,
we look for duplicated variable elements.  If there are none
(maxv == 1), but there is a constant element, we can duplicate
the constant element into a register.

The case when there's more than one constant element needs more thought
(and testcases :-)).  E.g. after a certain point, it would probably be
better to load the variable and constant parts separately and blend them
using TBL.  It also matters whether the constants are equal or not.

There are also cases that could be handled using EXT.

Plus, if we're inserting many variable elements that are already
in GPRs, we can probably do better by coalescing them into bigger
GPR values and inserting them as wider elements.

Because of things like that, I think we should stick to the
single-constant case for now.

Thanks,
Richard

Reply via email to