Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Friday, September 1, 2023 2:36 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; Marcus Shawcroft
>> <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com>
>> Subject: Re: [PATCH]AArch64 xorsign: Fix scalar xorsign lowering
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > In GCC-9 our scalar xorsign pattern broke and we didn't notice it
>> > because the testcase was not strong enough.  With this commit
>> >
>> > 8d2d39587d941a40f25ea0144cceb677df115040 is the first bad commit
>> > commit 8d2d39587d941a40f25ea0144cceb677df115040
>> > Author: Segher Boessenkool <seg...@kernel.crashing.org>
>> > Date:   Mon Oct 22 22:23:39 2018 +0200
>> >
>> >     combine: Do not combine moves from hard registers
>> >
>> > combine started introducing useless moves on hard registers,  when one
>> > of the arguments to our scalar xorsign is a hardreg we get an additional 
>> > move
>> inserted.
>> >
>> > This leads to combine forming an AND with the immediate inside and
>> > using the superflous move to do the r->w move, instead of what we
>> > wanted before which was for the `and` to be a vector and and have reload
>> pick the right alternative.
>> 
>> IMO, the xorsign optab ought to go away.  IIRC it was just a stop-gap measure
>> that (like most stop-gap measures) never got cleaned up later.
>> 
>> But that's not important now. :)
>> 
>> > To fix this the patch just forces the use of the vector version
>> > directly and so combine has no chance to mess it up.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >    * config/aarch64/aarch64-simd.md (xorsign<mode>3): Renamed to..
>> >    (@xorsign<mode>3): ...This.
>> >    * config/aarch64/aarch64.md (xorsign<mode>3): Renamed to...
>> >    (@xorsign<mode>3): ..This and emit vectors directly
>> >    * config/aarch64/iterators.md (VCONQ): Add SF and DF.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >    * gcc.target/aarch64/xorsign.c:
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> > b/gcc/config/aarch64/aarch64-simd.md
>> > index
>> >
>> f67eb70577d0c2d9911d8c867d38a4d0b390337c..e955691f1be8830efacc2
>> 3746511
>> > 9764ce2a4942 100644
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -500,7 +500,7 @@ (define_expand "ctz<mode>2"
>> >    }
>> >  )
>> >
>> > -(define_expand "xorsign<mode>3"
>> > +(define_expand "@xorsign<mode>3"
>> >    [(match_operand:VHSDF 0 "register_operand")
>> >     (match_operand:VHSDF 1 "register_operand")
>> >     (match_operand:VHSDF 2 "register_operand")] diff --git
>> > a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index
>> >
>> 01cf989641fce8e6c3828f6cfef62e101c4142df..9db82347bf891f9bc40aede
>> cdc84
>> > 62c94bf1a769 100644
>> > --- a/gcc/config/aarch64/aarch64.md
>> > +++ b/gcc/config/aarch64/aarch64.md
>> > @@ -6953,31 +6953,20 @@ (define_insn "copysign<GPF:mode>3_insn"
>> >  ;; EOR   v0.8B, v0.8B, v3.8B
>> >  ;;
>> >
>> > -(define_expand "xorsign<mode>3"
>> > +(define_expand "@xorsign<mode>3"
>> >    [(match_operand:GPF 0 "register_operand")
>> >     (match_operand:GPF 1 "register_operand")
>> >     (match_operand:GPF 2 "register_operand")]
>> >    "TARGET_SIMD"
>> >  {
>> > -
>> > -  machine_mode imode = <V_INT_EQUIV>mode;
>> > -  rtx mask = gen_reg_rtx (imode);
>> > -  rtx op1x = gen_reg_rtx (imode);
>> > -  rtx op2x = gen_reg_rtx (imode);
>> > -
>> > -  int bits = GET_MODE_BITSIZE (<MODE>mode) - 1;
>> > -  emit_move_insn (mask, GEN_INT (trunc_int_for_mode
>> (HOST_WIDE_INT_M1U << bits,
>> > -                                               imode)));
>> > -
>> > -  emit_insn (gen_and<v_int_equiv>3 (op2x, mask,
>> > -                              lowpart_subreg (imode, operands[2],
>> > -                                              <MODE>mode)));
>> > -  emit_insn (gen_xor<v_int_equiv>3 (op1x,
>> > -                              lowpart_subreg (imode, operands[1],
>> > -                                              <MODE>mode),
>> > -                              op2x));
>> > +  rtx tmp = gen_reg_rtx (<VCONQ>mode);  rtx op1 = gen_reg_rtx
>> > + (<VCONQ>mode);  rtx op2 = gen_reg_rtx (<VCONQ>mode);
>> emit_move_insn
>> > + (op1, lowpart_subreg (<VCONQ>mode, operands[1], <MODE>mode));
>> > + emit_move_insn (op2, lowpart_subreg (<VCONQ>mode, operands[2],
>> > + <MODE>mode));  emit_insn (gen_xorsign3(<VCONQ>mode, tmp, op1,
>> op2));
>> 
>> Do we need the extra moves into op1 and op2?  I would have expected the
>> subregs to be acceptable as direct operands of the xorsign3.  Making them
>> direct operands should be better, since there's then less risk of having the
>> same value live in different registers at the same time.
>> 
>
> That was the first thing I tried but it doesn't work because validate_subreg 
> seems
> to have the invariant that you can either change mode between the same size
> or make it paradoxical but not both at the same time.
>
> i.e. it rejects subreg:V2DI (subreg:DI (reg:DF))), and lowpart_subreg folds 
> it to
> NULL_RTX. Because the lowering when the input is a subreg takes the mode of
> the original RTX. i.e. the above is folder to subreg:V2DI (reg:DF) which is 
> not valid.

Gah, I'd forgotten about that.

I think we should relax that rule.  validate_subreg was originally added
to prevent some pretty egregious uses of subregs, but as the code shows,
it was never possible to lock things down as much as hoped.  And I'm not
sure it makes as much sense to try these days.  I think we have a pretty
settled idea of what subregs mean in a target-independent sense, and we
now have better ways than we did before for targets to say which subregs
they support natively for a given register class.

(subreg:V2DI (reg:DF X)) and (subreg:V2DF (reg:DI X)) are IMO perfectly
well-defined in a target-independent sense, so I think we should relax:

  else if (VECTOR_MODE_P (omode)
           && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
    ;

to:

  else if (VECTOR_MODE_P (omode)
           && GET_MODE_UNIT_SIZE (omode) == GET_MODE_UNIT_SIZE (imode))
    ;

Preapproved if it works & passes bootstrap on aarch64-linux-gnu,
powerpc64el-linux-gnu and x86_64-linux-gnu, unless someone objects
beforehand.

Thanks,
Richard

Reply via email to