RE: [PATCH 2/2]AArch64 Support new tbranch optab.

Tamar Christina via Gcc-patches Thu, 24 Nov 2022 04:19:24 -0800

Hi,

I had a question and I figured I'd easier to ask before I spend more time 
implementing it 😊


I had noticed that one of the other reasons that cbranch and the other optabs 
like cmov explicitly
emit the compare separately and use combine to match up the final form is for 
ifcvt.

In particular by expanding tbranch directly to the final RTL we lose some ifcvt 
because there are
no patterns that can handle the new zero_extract idiom.

So the three solutions I can think of are:

1. Don't expand tbranch to its final form immediately, but still use 
zero_extract.  This regresses -O0. (but do we care?)
2. Expand tbranch with vec_extract and provide new zero_extract based rtl 
sequences for ifcvt.
     I currently tried this, and while it works, I don't fully trust the RTL.  
In particular unlike say, combine
     Ifcvt doesn't allow me to add an extra clobber to say that CC Is clobbered 
by the pattern.  Now tbranch
     Itself also expands a clobber, so the RTL isn't wrong even after ifcvt, 
but I'm worried that the pattern can
     Be idiom recognized and then no clobber could be present.  I could modify 
the recog code in ifcvt to try to
     Ignore clobbers during matching.
3.  I could expand using AND instead of zero_extract instead.   We have more 
patterns handling AND, but I'm not
     Sure If this will fix the problem entirely, but in principle could expand 
to what ANDS generates and recog that instead.
    This shouldn't regress -O0 as we wouldn't put a zero_extract explicitly in 
RTL (and we already have a pattern for ANDS).

What do you think? I personally favor 3.. 

Thanks,
Tamar

> -----Original Message-----
> From: Richard Sandiford <[email protected]>
> Sent: Tuesday, November 22, 2022 2:00 PM
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; Richard Earnshaw
> <[email protected]>; nd <[email protected]>; Marcus Shawcroft
> <[email protected]>
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina <[email protected]> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <[email protected]>
> >> Sent: Tuesday, November 15, 2022 11:34 AM
> >> To: Tamar Christina <[email protected]>
> >> Cc: [email protected]; Richard Earnshaw
> >> <[email protected]>; nd <[email protected]>; Marcus Shawcroft
> >> <[email protected]>
> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >>
> >> Tamar Christina <[email protected]> writes:
> >> >> -----Original Message-----
> >> >> From: Richard Sandiford <[email protected]>
> >> >> Sent: Tuesday, November 15, 2022 11:15 AM
> >> >> To: Tamar Christina <[email protected]>
> >> >> Cc: [email protected]; Richard Earnshaw
> >> >> <[email protected]>; nd <[email protected]>; Marcus
> Shawcroft
> >> >> <[email protected]>
> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >>
> >> >> Tamar Christina <[email protected]> writes:
> >> >> >> -----Original Message-----
> >> >> >> From: Richard Sandiford <[email protected]>
> >> >> >> Sent: Tuesday, November 15, 2022 10:51 AM
> >> >> >> To: Tamar Christina <[email protected]>
> >> >> >> Cc: [email protected]; Richard Earnshaw
> >> >> >> <[email protected]>; nd <[email protected]>; Marcus
> >> Shawcroft
> >> >> >> <[email protected]>
> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >> >>
> >> >> >> Tamar Christina <[email protected]> writes:
> >> >> >> >> -----Original Message-----
> >> >> >> >> From: Richard Sandiford <[email protected]>
> >> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
> >> >> >> >> To: Tamar Christina <[email protected]>
> >> >> >> >> Cc: [email protected]; Richard Earnshaw
> >> >> >> >> <[email protected]>; nd <[email protected]>; Marcus
> >> >> Shawcroft
> >> >> >> >> <[email protected]>
> >> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >> >> >>
> >> >> >> >> Tamar Christina <[email protected]> writes:
> >> >> >> >> > Hello,
> >> >> >> >> >
> >> >> >> >> > Ping and updated patch.
> >> >> >> >> >
> >> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> >> issues.
> >> >> >> >> >
> >> >> >> >> > Ok for master?
> >> >> >> >> >
> >> >> >> >> > Thanks,
> >> >> >> >> > Tamar
> >> >> >> >> >
> >> >> >> >> > gcc/ChangeLog:
> >> >> >> >> >
> >> >> >> >> >         * config/aarch64/aarch64.md (*tb<optab><mode>1):
> >> >> >> >> > Rename
> >> >> to...
> >> >> >> >> >         (*tb<optab><ALLI:mode><GPI:mode>1): ... this.
> >> >> >> >> >         (tbranch<mode>4): New.
> >> >> >> >> >
> >> >> >> >> > gcc/testsuite/ChangeLog:
> >> >> >> >> >
> >> >> >> >> >         * gcc.target/aarch64/tbz_1.c: New test.
> >> >> >> >> >
> >> >> >> >> > --- inline copy of patch ---
> >> >> >> >> >
> >> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
> >> >> >> >> > b/gcc/config/aarch64/aarch64.md index
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> >> >> >> >> 71
> >> >> >> >> > 2bde55c7c72e 100644
> >> >> >> >> > --- a/gcc/config/aarch64/aarch64.md
> >> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb<optab><mode>1"
> >> >> >> >> >                       (const_int 1)))]
> >> >> >> >> >  )
> >> >> >> >> >
> >> >> >> >> > -(define_insn "*tb<optab><mode>1"
> >> >> >> >> > +(define_expand "tbranch<mode>4"
> >> >> >> >> >    [(set (pc) (if_then_else
> >> >> >> >> > -             (EQL (zero_extract:DI (match_operand:GPI 0
> >> >> "register_operand"
> >> >> >> >> "r")
> >> >> >> >> > -                                   (const_int 1)
> >> >> >> >> > -                                   (match_operand 1
> >> >> >> >> > -                                     
> >> >> >> >> > "aarch64_simd_shift_imm_<mode>" "n"))
> >> >> >> >> > +               (match_operator 0 "aarch64_comparison_operator"
> >> >> >> >> > +                [(match_operand:ALLI 1 "register_operand")
> >> >> >> >> > +                 (match_operand:ALLI 2
> >> >> >> >> "aarch64_simd_shift_imm_<ALLI:mode>")])
> >> >> >> >> > +               (label_ref (match_operand 3 "" ""))
> >> >> >> >> > +               (pc)))]
> >> >> >> >> > +  "optimize > 0"
> >> >> >> >>
> >> >> >> >> Why's the pattern conditional on optimize?  Seems a valid
> >> >> >> >> choice at -O0
> >> >> >> too.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > I had explained the reason why in the original patch, just
> >> >> >> > didn't repeat it in
> >> >> >> the ping:
> >> >> >> >
> >> >> >> > Instead of emitting the instruction directly I've chosen to
> >> >> >> > expand the pattern using a zero extract and generating the
> >> >> >> > existing pattern for comparisons for two
> >> >> >> > reasons:
> >> >> >> >
> >> >> >> >   1. Allows for CSE of the actual comparison.
> >> >> >> >   2. It looks like the code in expand makes the label as
> >> >> >> > unused and removed
> >> >> >> it
> >> >> >> >      if it doesn't see a separate reference to it.
> >> >> >> >
> >> >> >> > Because of this expansion though I disable the pattern at -O0
> >> >> >> > since we
> >> >> >> have no combine in that case so we'd end up with worse code.  I
> >> >> >> did try emitting the pattern directly, but as mentioned in no#2
> >> >> >> expand would then kill the label.
> >> >> >> >
> >> >> >> > Basically I emit the pattern directly, immediately during
> >> >> >> > expand the label is
> >> >> >> marked as dead for some weird reason.
> >> >> >>
> >> >> >> Isn't #2 a bug though?  It seems like something we should fix
> >> >> >> rather than work around.
> >> >> >
> >> >> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need
> >> >> > to split the optabs still? Isn't the problem atm that I need the 
> >> >> > split?
> >> >> > If I'm emitting the instruction directly then the recog pattern
> >> >> > for it can just be (eq (vec_extract x 1) 0) which is the correct
> semantics?
> >> >>
> >> >> What rtx does the code that uses the optab pass for operand 0?
> >> >
> >> > It gets passed the full comparison:
> >> >
> >> > (eq (reg/v:SI 92 [ x ])
> >> >     (const_int 0 [0]))
> >> >
> >> > of which we only look at the operator.
> >>
> >> OK, that's what I thought.  The problem is then the one I mentioned
> above.
> >> This rtx doesn't describe the operation that the optab is supposed to
> >> perform, so it can never be used in the instruction pattern.  (This
> >> is different from something like cbranch, where operand 0 can be used
> >> directly if the target supports a very general compare-and-branch
> >> instruction.)
> >
> > So I was wrong before about which RTL it gets passed.  Deep in the
> > expansion Code the rtl operation
> >
> > (eq (reg/v:SI 92 [ x ])
> >       (const_int 0 [0]))
> >
> > Gets broken up and passed piecewise.
> >
> > First thing it does it explicitly check that the first argument in RTL is an
> operator:
> >
> > gcc_assert (insn_operand_matches (icode, 0, test));
> >
> > and then the jump is emitted by breaking apart the rtl into it's operands:
> >
> > 4646      insn = emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0),
> > 4647                          XEXP (test, 1), label));
> 
> Yeah, but the question was what the code that generates the tbranch optab
> passes for operand 0 ("test" in the call above).  And like you said, it's the 
> EQ
> rtx above, with XEXPs 0 and 1 being passed as operands
> 1 and 2.  I think the point still stands that that EQ rtx doesn't describe the
> correct operation.
> 
> > And so the operands are:
> >
> >>>> p debug (operand0)
> > (reg/v:SI 92 [ xD.4391 ])
> >
> >>>> p debug (operand1)
> > (const_int 0 [0])
> >
> >>>> p debug (operand2)
> > (code_label 0 0 0 2 (nil) [0 uses])
> >
> > And targets never get to see the equality check.
> 
> But the .md pattern was:
> 
> (define_expand "tbranch<mode>4"
>   [(set (pc) (if_then_else
>               (match_operator 0 "aarch64_comparison_operator"
>                [(match_operand:ALLI 1 "register_operand")
>                 (match_operand:ALLI 2
> "aarch64_simd_shift_imm_<ALLI:mode>")])
>               (label_ref (match_operand 3 "" ""))
>               (pc)))]
>   "optimize > 0"
> {
>   rtx bitvalue = gen_reg_rtx (DImode);
>   rtx tmp = simplify_gen_subreg (DImode, operands[1], GET_MODE
> (operands[1]), 0);
>   emit_insn (gen_extzv (bitvalue, tmp, const1_rtx, operands[2]));
>   operands[2] = const0_rtx;
>   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> bitvalue,
>                                        operands[2]);
> })
> 
> where the EQ/NE rtx is passed and matched as operand 0.
> 
> > If the documentation of the optab is
> > Updated to say that the target operand1 is to be used in a
> > zero_extract with operand0 and compared with 0 then that should be fine
> no?  that's the semantic of the optab itself.
> >
> > Based on that I don't think we need to split this optab do we?  Just
> > update the docs to clarify the zero extract semantics?
> 
> Well, the point of...
> 
> >> If we want to use a single optab, the code that generates the optab
> >> should pass something like:
> >>
> >>   (eq/ne (zero_extract op0 (const_int 1) op1) (const_int 0))
> >>
> >> as operand 0, so that operand 0 specifies the real test condition.
> >>
> >> Thanks,
> >> Richard
> 
> ...was that we should either (a) split the optab or (b) keep the single optab
> and pass a "proper" description of the operation as operand 0.
> 
> Thanks,
> Richard

RE: [PATCH 2/2]AArch64 Support new tbranch optab.

Reply via email to