Tamar Christina <tamar.christ...@arm.com> writes:
> Hi,
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Monday, September 30, 2024 6:33 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; Marcus Shawcroft
>> <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH 2/2]AArch64: support encoding integer immediates using
>> floating point moves
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > This patch extends our immediate SIMD generation cases to support 
>> > generating
>> > integer immediates using floating point operation if the integer immediate 
>> > maps
>> > to an exact FP value.
>> >
>> > As an example:
>> >
>> > uint32x4_t f1() {
>> >     return vdupq_n_u32(0x3f800000);
>> > }
>> >
>> > currently generates:
>> >
>> > f1:
>> >         adrp    x0, .LC0
>> >         ldr     q0, [x0, #:lo12:.LC0]
>> >         ret
>> >
>> > i.e. a load, but with this change:
>> >
>> > f1:
>> >         fmov    v0.4s, 1.0e+0
>> >         ret
>> >
>> > Such immediates are common in e.g. our Math routines in glibc because they 
>> > are
>> > created to extract or mark part of an FP immediate as masks.
>> 
>> I agree this is a good thing to do.  The current code is too beholden
>> to the original vector mode.  This patch relaxes it so that it isn't
>> beholden to the original mode's class (integer vs. float), but it would
>> still be beholden to the original mode's element size.
>
> I've implemented this approach and it works but I'm struggling with an 
> inconsistency
> in how zeros are created.
>
> There are about 800 SVE ACLE tests like acge_f16.c that check that a zero is 
> created
> using a mov of the same sized register as the usage.  So I added an exception 
> for
> zero to use the original input element mode.
>
> But then there are about 400 other SVE ACLE tests that actually check that 
> zeros are
> created using byte moves, like dup_128_s16_z even though they're used as ints.
>
> So these two are in conflict.  Do you care which way I resolve this?  since 
> it's zero
> it shouldn't matter how they're created but perhaps there's a reason why some
> test check for the specific instruction?

No, I think it was an oversight.  Any element size would be correct.

Using byte moves sounds like a good thing.  It would be good to
share constants at some point (like we do with ptrues) and using
the smallest element size would then be the natural choice.

Sorry for the drudge work in updating all the tests.  Hope that
generalising them to be size-agnostic turns out to be sed-able,
or at least a simple script.

Thanks,
Richard


Reply via email to