On Wed, 9 Jul 2025, Tamar Christina wrote:

> (on mobile so doing a top reply)
> 
> > So it isn't as efficient to use cbranch_any (g != 0) here?  I  think it 
> > should be practically equivalent...
> 
> Ah yeah, it can expand what we currently expand vector boolean to.
> 
> I was initially confused because for SVE what we want here is an ORRS (flag 
> setting inclusive ORR)
> 
> Using this optab we can get to that an easier way too.
> 
> So yeah I agree, cbranch for vectors can be deprecated.
> 
> Note that in my patch I named the new one vec_cbranch_any/all to implicitly 
> say they are only vectors.
> 
> Do you want to fully deprecated cbranch for vectors?

Yes, I think that would remove confusion.

> This would mean though that all target checks needs to be updated unless we 
> update the supports checks with a helper?
> 
> Thanks,
> Tamar
> 
> ________________________________
> From: Richard Biener <rguent...@suse.de>
> Sent: Wednesday, July 9, 2025 1:24 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>
> Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and 
> vec_cbranch_all [PR118974]
> 
> On Wed, 9 Jul 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguent...@suse.de>
> > > Sent: Wednesday, July 9, 2025 12:36 PM
> > > To: Tamar Christina <tamar.christ...@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>
> > > Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> > > vec_cbranch_all [PR118974]
> > >
> > > On Wed, 9 Jul 2025, Tamar Christina wrote:
> > >
> > > > > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are 
> > > > > > the
> > > > > > +first and second operands of the comparison, respectively.  
> > > > > > Operand 3
> > > > > > +is the @code{code_label} to jump to.
> > > > > > +
> > > > > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > > > > +@item @samp{cbranch_all@var{mode}4}
> > > > > > +Conditional branch instruction combined with a compare instruction 
> > > > > > on
> > > vectors
> > > > > > +where it is required that at all of the elementwise comparisons of 
> > > > > > the
> > > > > > +two input vectors are true.
> > > > >
> > > > > See above.
> > > > >
> > > > > When I look at the RTL for aarch64 I wonder whether the middle-end
> > > > > can still invert a jump (for BB reorder, for example)?  Without
> > > > > a cbranch_none expander we have to invert during RTL expansion?
> > > > >
> > > >
> > > > Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero.
> > > > I think all states are expressible with any and all and flipping the 
> > > > branches
> > > > so it shouldn't be any more restrictive than cbranch itself is today.
> > > >
> > > > cbranch also only supports eq and ne, so none would be cbranch (eq x 0)
> > > >
> > > > and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be
> > > simplified to:
> > > >
> > > > (insn 23 22 24 5 (parallel [
> > > >             (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
> > > >                 (unspec:VNx4BI [
> > > >                         (reg:VNx4BI 129)
> > > >                         (const_int 0 [0x0])
> > > >                         (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
> > > >                             (const_vector:VNx4SI repeat [
> > > >                                     (const_int 0 [0])
> > > >                                 ]))
> > > >                     ] UNSPEC_PRED_Z))
> > > >             (clobber (reg:CC_NZC 66 cc))
> > > >         ]) "cbranch.c":25:10 -1
> > > >
> > > > (jump_insn 27 26 28 5 (set (pc)
> > > >         (if_then_else (eq (reg:CC_Z 66 cc)
> > > >                 (const_int 0 [0]))
> > > >             (label_ref 33)
> > > >             (pc))) "cbranch.c":25:10 -1
> > > >      (int_list:REG_BR_PROB 1014686025 (nil))
> > > >
> > > > The thing is we can't rid of the unspecs as there's concept of masking 
> > > > in RTL
> > > compares.
> > > > We could technically do an AND (and do in some cases) but then you lose 
> > > > the
> > > predicate
> > > > Hint constant in the RTL which tells you whether the mask is known to 
> > > > be all true
> > > or not.
> > > > This hint is crucial to allow for further optimizations.
> > > >
> > > > That said the condition code, branch and compares are fully exposed.
> > > >
> > > > We expand to a larger sequence than I'd like mostly because there's no 
> > > > support
> > > > for conditional cbranch optabs, or even conditional vector comparisons. 
> > > > So the
> > > comparisons
> > > > must be generated unpredicated by generating an all true mask, and later
> > > patterns
> > > > merge in the AND.
> > > >
> > > > The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a 
> > > > single
> > > loop)
> > > > But not pure SVE.  For which I take a different approach to try to 
> > > > avoid requiring
> > > > a predicated version of these optabs.
> > > >
> > > > I don't want to push my luck, but would you be ok with a conditional 
> > > > version of
> > > these
> > > > optabs too? i.e. cond_cbranch_all and cond_cbranch_all?  This would 
> > > > allow us to
> > > > immediately expand to the correct representation for both SVE and 
> > > > Adv.SIMD
> > > > without having to rely on various combine patterns and cc-fusion to 
> > > > optimize the
> > > sequences
> > > > later on (which has historically been a bit hit or miss if someone adds 
> > > > a new CC
> > > pattern).
> > >
> > > Can you explain?  A conditional conditional branch makes my head hurt.
> > > It's really a cbranch_{all,any} where the (vector) compare has an
> > > additional mask applied?  So cbranch_cond_{all,any} would be a better
> > > fit?
> >
> > Yeah so cbranch is itself in GIMPLE
> >
> > c = vec_a `cmp` vec_b
> > if (c {any,all} 0)
> >   ...
> >
> > where cbranch_{all, any} represents the gimple
> >
> > If (vec_a `cmp` vec_b)
> >   ...
> >
> > cbranch_cond_{all, any} would represent
> >
> > if ((vec_a `cmp` vec_b) & loop_mask)
> >   ...
> >
> > In GIMPLE we mask most operations by & the mask with the result
> > of the operation.  But cbranch doesn't have an LHS, so we can't wrap
> > the & around anything.  And because of this we rely on backend patterns
> > to later push the mask from the & into the operation such that we can
> > generate the predicated compare.
> 
> OK, so we could already do
> 
>  c = .COND_`cmp` (vec_a, vec_b, loop_mask, { 0, 0... });
>  if (c {any,all} 0)
> 
> but I can see how cond_cbranch is useful (besides chosing a proper name).
> 
> > Because of this we end up requiring patterns such as
> >
> > ;; Predicated integer comparisons, formed by combining a PTRUE-predicated
> > ;; comparison with an AND in which only the flags result is interesting.
> > (define_insn_and_rewrite "*cmp<cmp_op><mode>_and_ptest"
> >   [(set (reg:CC_Z CC_REGNUM)
> >        (unspec:CC_Z
> >          [(match_operand:VNx16BI 1 "register_operand")
> >           (match_operand 4)
> >           (const_int SVE_KNOWN_PTRUE)
> >           (and:<VPRED>
> >             (unspec:<VPRED>
> >               [(match_operand 5)
> >                (const_int SVE_KNOWN_PTRUE)
> >                (SVE_INT_CMP:<VPRED>
> >                  (match_operand:SVE_I 2 "register_operand")
> >                  (match_operand:SVE_I 3 
> > "aarch64_sve_cmp_<sve_imm_con>_operand"))]
> >               UNSPEC_PRED_Z)
> >             (match_operand:<VPRED> 6 "register_operand"))]
> >          UNSPEC_PTEST))
> >    (clobber (match_scratch:<VPRED> 0))]
> >   "TARGET_SVE"
> >   {@ [ cons: =0, 1    , 2 , 3            ; attrs: pred_clobber ]
> >      [ &Upa    ,  Upl, w , <sve_imm_con>; yes                 ] 
> > cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, #%3
> >      [ ?Upl    ,  0  , w , <sve_imm_con>; yes                 ] ^
> >      [ Upa     ,  Upl, w , <sve_imm_con>; no                  ] ^
> >      [ &Upa    ,  Upl, w , w            ; yes                 ] 
> > cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, %3.<Vetype>
> >      [ ?Upl    ,  0  , w , w            ; yes                 ] ^
> >      [ Upa     ,  Upl, w , w            ; no                  ] ^
> >   }
> >   "&& !rtx_equal_p (operands[4], operands[5])"
> >   {
> >     operands[5] = copy_rtx (operands[4]);
> >   }
> > )
> >
> > And rely on combine to eventually connect enough pieces back together.
> > With the mask cbranch patterns we can just emit a simplified RTL:
> >
> >  (insn 23 22 24 5 (parallel [
> >              (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
> >                     (unspec:VNx4BI [
> >                          (reg:VNx4BI 129)
> >                          (const_int 0 [0x0])
> >                          (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
> >                              (const_vector:VNx4SI repeat [
> >                                      (const_int 0 [0])
> >                                  ]))
> >                      ] UNSPEC_PRED_Z))
> >              (clobber (reg:CC_NZC 66 cc))
> >          ]) "cbranch.c":25:10 -1
> >
> > Immediately out of expand.
> >
> > >  Though 'cond_' elsewhere suggests there is an else value, instead
> > > cbranch_mask_{all,any}?  Or would that not capture things exactly?
> >
> > Yeah mask is fine too, I only suggested cond since that's what we normally 
> > used.
> > But mask works too.
> 
> cond_cbranch didn't seem to imply the cond_ is applied to the compare,
> but from a 2nd look it might be OK after all.
> 
> > > cbranch is compare-and-branch, so masked-compare-and-branch aka
> > > mcbranch[_{all,any}]?
> > >
> > > And writing this and not tryinig to remember everything said it
> > > appears that 'cbranch' itself (for vectors) becomes redundant?
> > >
> >
> > For the most part, but not when unrolling is involved where the compare
> > is done on the reduction of masks. i.e. for
> >
> > a = b > c
> > d = e > f
> > g = d | a
> > if (g != 0)
> >   ...
> >
> > We can't really use the new optabs as `g` itself is not a comparison.
> 
> So it isn't as efficient to use cbranch_any (g != 0) here?  I think
> it should be practically equivalent...
> 
> Richard.
> 
> > For VLA though
> > this happens less often due to the support for widening loads and narrowing 
> > stores.
> >
> > Thanks,
> > Tamar
> >
> > > Richard.
> > >
> > > > And the reason for both is that for Adv.SIMD there's no mask at GIMPLE 
> > > > level and
> > > we have to
> > > > make it during expand.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to