Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Biener <richard.guent...@gmail.com>
>> Sent: Wednesday, May 15, 2024 12:20 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; Marcus Shawcroft
>> <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org; Richard Sandiford
>> <richard.sandif...@arm.com>
>> Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on 
>> certain
>> operations.
>> 
>> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
>> <tamar.christ...@arm.com> wrote:
>> >
>> > Hi All,
>> >
>> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state
>> > that for predicated operations that also produce a predicate it is 
>> > preferred
>> > that the codegen should use a different register for the destination than 
>> > that
>> > of the input predicate in order to avoid a performance overhead.
>> >
>> > This of course has the problem that it increases register pressure and so 
>> > should
>> > be done with care.  Additionally not all micro-architectures have this
>> > consideration and so it shouldn't be done as a default thing.
>> >
>> > The patch series adds support for doing conditional early clobbers through 
>> > a
>> > combination of new alternatives and attributes to control their 
>> > availability.
>> 
>> You could have two alternatives, one with early clobber and one with
>> a matching constraint where you'd disparage the matching constraint one?
>> 
>
> Yeah, that's what I do, though there's no need to disparage the non-early 
> clobber
> alternative as the early clobber alternative will naturally get a penalty if 
> it needs a
> reload.

But I think Richard's suggestion was to disparage the one with a matching
constraint (not the earlyclobber), to reflect the increased cost of
reusing the register.

We did take that approach for gathers, e.g.:

     [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
     [?w, Z,   0, Ui1, Ui1, Upl] ^

The (supposed) advantage is that, if register pressure is so tight
that using matching registers is the only alternative, we still
have the opportunity to do that, as a last resort.

Providing only an earlyclobber version means that using the same
register is prohibited outright.  If no other register is free, the RA
would need to spill something else to free up a temporary register.
And it might then do the equivalent of (pseudo-code):

      not p1.b, ..., p0.b
      mov p0.d, p1.d

after spilling what would otherwise have occupied p1.  In that
situation it would be better use:

      not p0.b, ..., p0.b

and not introduce the spill of p1.

Another case where using matching registers is natural is for
loop-carried dependencies.  Do we want to keep them in:

   loop:
      ...no other sets of p0....
      not p0.b, ..., p0.b
      ...no other sets of p0....
      bne loop

or should we split it to:

   loop:
      ...no other sets of p0....
      not p1.b, ..., p0.b
      mov p0.d, p1.d
      ...no other sets of p0....
      bne loop

?

Thanks,
Richard

>
> Cheers,
> Tamar
>
>> > On high register pressure we also use LRA's costing to prefer not to use 
>> > the
>> > alternative and instead just use the tie as this is preferable to a reload.
>> >
>> > Concretely this patch series does:
>> >
>> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2
>> >
>> > foo:
>> >         mov     z31.h, w0
>> >         ptrue   p3.b, all
>> >         cmplo   p0.h, p3/z, z0.h, z31.h
>> >         b       use
>> >
>> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve
>> >
>> > foo:
>> >         mov     z31.h, w0
>> >         ptrue   p0.b, all
>> >         cmplo   p0.h, p0/z, z0.h, z31.h
>> >         b       use
>> >
>> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -
>> ffixed-p[1-15]
>> >
>> > foo:
>> >         mov     z31.h, w0
>> >         ptrue   p0.b, all
>> >         cmplo   p0.h, p0/z, z0.h, z31.h
>> >         b       use
>> >
>> > Testcases for the changes are in the last patch of the series.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Thanks,
>> > Tamar
>> >
>> > ---
>> >
>> > --

Reply via email to