On Tue, Nov 29, 2022 at 6:40 AM Prathamesh Kulkarni via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> For the following test-case:
>
> int16x8_t foo(int16_t x, int16_t y)
> {
>   return (int16x8_t) { x, y, x, y, x, y, x, y };
> }

(Not to block this patch)
Seems like this trick can be done even with less than perfect initializer too:
e.g.
int16x8_t foo(int16_t x, int16_t y)
{
  return (int16x8_t) { x, y, x, y, x, y, x, 0 };
}

Which should generate something like:
dup v0.8h, w0
dup v1.8h, w1
zip1 v0.8h, v0.8h, v1.8h
ins v0.h[7], wzr

Thanks,
Andrew Pinski


>
> Code gen at -O3:
> foo:
>         dup    v0.8h, w0
>         ins     v0.h[1], w1
>         ins     v0.h[3], w1
>         ins     v0.h[5], w1
>         ins     v0.h[7], w1
>         ret
>
> For 16 elements, it results in 8 ins instructions which might not be
> optimal perhaps.
> I guess, the above code-gen would be equivalent to the following ?
> dup v0.8h, w0
> dup v1.8h, w1
> zip1 v0.8h, v0.8h, v1.8h
>
> I have attached patch to do the same, if number of elements >= 8,
> which should be possibly better compared to current code-gen ?
> Patch passes bootstrap+test on aarch64-linux-gnu.
> Does the patch look OK ?
>
> Thanks,
> Prathamesh

Reply via email to