On Thu, Oct 13, 2022 at 11:35 PM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote:
> > > > +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> > > > +                        SFmode, NULL_RTX, NULL,
> > > > +                        as_a <rtx_code_label *> (operands[3]),
> > > > +                        /* Unfortunately this isn't propagated.  */
> > > > +                        profile_probability::even ());
> >
> > You could use ix86_expand_branch instead of do_compare_rtx_and_jump
> > here. This would expand in SFmode, so insn condition from cbranchsf4
> > should be copied here:
> >
> >   "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)"
> >
> > Additionally, ix86_fp_comparison_operator predicate should be used for
> > operator0. Basically, just copy predicates from cbranchsf4 as we are
> > effectively expanding the SFmode compare & branch.
>
> The reason why I've used there the generic routine was exactly to handle
> not just ix86_fp_comparison_operator, but also comparisons that are more
> complex than that (need 2 comparisons).
>
> While for ix86_fp_comparison_operator cases the optabs wouldn't be actually
> strictly needed, the generic code would see e.g. cbranchbf4 isn't supported
> and try cbranchsf4, succeed on that and the only disadvantage would be
> that the BFmode -> SFmode extensions would be performed using library
> functions unless -ffast-math while they can be handled by left shifting
> the 16 BFmode bits to most significant 16 bits of SFmode even when honoring
> NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior
> is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor
> cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall
> (__{eq,ne}bf2).  I bet that is the reason why libgcc contains __{eq,ne}hf2
> entrypoints.
> I wanted to avoid adding __{eq,ne}bf2 and the addition of
> cbranchbf4/cstorebf4 was how I managed to do that; by telling the
> generic code that it can handle those by the faster BFmode to SFmode
> conversions of the operands and then perform one or two bit checks.

Thanks, for the explanation, I see the intention now.

The patch is OK as is.

Thanks,
Uros.

> I guess another possibility would be to call ix86_expand_branch there
> once or twice and repeat what the generic code does, or add the
> libgcc entrypoints which would perhaps bypass soft-fp and just do the
> shifts + SFmode comparison.
>
> > > > +  else
> > > > +    {
> > > > +      rtx t2 = gen_reg_rtx (SImode);
> > > > +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> > > > +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > > > +      op2 = gen_lowpart (SFmode, t2);
> > > > +    }
> >
> > Similar to cbranch above, use ix86_expand_setcc and copy predicates
> > from cstoresf4.
>
> Ditto here, cstore was actually quite required by the generic code when
> cbranch is implemented.
>
>         Jakub
>

Reply via email to