On Mon, Nov 18, 2024 at 09:32:29AM +0100, Andreas Krebbel wrote:
> Hi Stefan,
>
>
> On 11/12/24 10:35, Stefan Schulze Frielinghaus wrote:
> > > > + rtx cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode,
> > > > CC_REGNUM), const0_rtx);
> > > > + if (operands[4] == const0_rtx)
> > > > + emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2],
> > > > operands[3]));
> > > > + else
> > > If we would just generate the alc pattern with a carry in of 0, wouldn't
> > > the
> > > other optimizers be able to figure out that the a normal add would do
> > > here?
> > > This path does not seem to get exercised by your testcases.
> > The zero carry in can occur due to
> > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a7aec76a74dd38524be325343158d3049b6ab3ac
> > which is why we have a special case just in order to emit an add with
> > carry out only.
>
> Ok. I was hoping that this downgrade from add with carry in/out to add with
> carry out would happen somewhat "automatically", if the the carry in happens
> to be a constant zero. But probably not.
>
> Your testcases invokes the pattern also with a constant 0 as carry in, but
> since you prevent inlining the pattern is never expanded with a const0_rtx.
> The testcase in the commit above is x86 specific, so it might make sense to
> add an invocation which triggers the code path explicitly. Just to be sure.
Ok, just for the curiosity, my take is that the middle end only emits an
u{add,sub}c optab with a constant zero if a pattern is detected as it
would occur e.g. for bitints, i.e., where multiple limbs are about to be
handled and the first one has a zero carry in. Thus, just invoking the
test case with a zero results in an add-overflow only as expected.
Anyhow, I have added tests u{add,sub}c-3.c in order to test this path,
too.
>
> > > create a CCU mode comparison result, which is currently consumed as CCL1
> > > by
> > > the ALC pattern. This seems to be inconsistent. s390_emit_compare returns
> > > the condition, which I think needs to be fed into the alc_carry1_cc
> > > pattern
> > > as input condition.
> > >
> > > Also is LTU really the correct code here? CCU + LTU would expect CC1, but
> > > you want CC2 or CC3 for the carry in, so GTU should be the better choice.
> > > s390_alc_comparison should make sure that only valid combinations are
> > > accepted, CCU + GTU would be one of them.
> > I was coming up with my own condition since conditions created by
> > s390_emit_compare() are of void mode which is why the alc predicates
> > s390_alc_comparison() failed since these require GPR mode. I've fixed
> > that by using PUT_MODE_RAW on those. I think we could also remove the
> > mode from the match_operand which might be more appropriate. I've done
> > it the latter way for sub<mode>3_slb_borrow1_cc. Once we have settled
> > for one or the other version I will align uaddc/usubc.
>
> The PLUS/MINUS arithmetic operations require both operands to have a proper
> integer mode. So I think dropping the mode from match_operand would be
> wrong. On the other hand, an IF_THEN_ELSE always requires the comparison to
> have VOIDmode, that's what s390_emit_compare is supposed to be used for. So
> taking what s390_emit_compare generates and changing the mode is the right
> thing here. Since PUT_MODE is always a bit ugly, we might also want go with
> extending s390_emit_compare with a target mode operand defaulting to
> VOIDmode instead.
Ok, I have added a mode operand to s390_emit_compare() and changed
everything accordingly.
>
> > > > +/* { dg-do run } */
> > > > +/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
> > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" {
> > > > target lp64 } } } */
> > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" {
> > > > target { ! lp64 } } } } */
> > > > +/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
> > > > +/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } }
> > > > } */
> > > Your checks seem to rely on the testcase being compiled with at least
> > > -march=z13.
>
> I think for the run test you would have to make sure that the test is not
> executed on machines older than z13 then.
>
> /* { dg-do run { target { s390_useable_hw } } } */
Didn't know about this neat predicate. Will probably use it more often
in the future :)
I will send a v2 shortly.
Cheers,
Stefan