On Sun, Dec 31, 2023 at 4:56 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak <ubiz...@gmail.com>
> > Sent: 28 December 2023 10:33
> > On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle <ro...@nextmovesoftware.com>
> > wrote:
> > >
> > > This patch resolves the failure of pr43644-2.c in the testsuite, a
> > > code quality test I added back in July, that started failing as the
> > > code GCC generates for 128-bit values (and their parameter passing)
> > > has been in flux.  After a few attempts at tweaking pattern
> > > constraints in the hope of convincing reload to produce a more
> > > aggressive (but potentially
> > > unsafe) register allocation, I think the best solution is to use a
> > > peephole2 to catch/clean-up this specific case.
> > >
> > > Specifically, the function:
> > >
> > > unsigned __int128 foo(unsigned __int128 x, unsigned long long y) {
> > >   return x+y;
> > > }
> > >
> > > currently generates:
> > >
> > > foo:    movq    %rdx, %rcx
> > >         movq    %rdi, %rax
> > >         movq    %rsi, %rdx
> > >         addq    %rcx, %rax
> > >         adcq    $0, %rdx
> > >         ret
> > >
> > > and with this patch/peephole2 now generates:
> > >
> > > foo:    movq    %rdx, %rax
> > >         movq    %rsi, %rdx
> > >         addq    %rdi, %rax
> > >         adcq    $0, %rdx
> > >         ret
> > >
> > > which I believe is optimal.
> >
> > How about simply moving the assignment to the MSB in the split pattern 
> > after the
> > LSB calculation:
> >
> >   [(set (match_dup 0) (match_dup 4))
> > -   (set (match_dup 5) (match_dup 2))
> >    (parallel [(set (reg:CCC FLAGS_REG)
> >                   (compare:CCC
> >                     (plus:DWIH (match_dup 0) (match_dup 1))
> >                     (match_dup 0)))
> >              (set (match_dup 0)
> >                   (plus:DWIH (match_dup 0) (match_dup 1)))])
> > +   (set (match_dup 5) (match_dup 2))
> >    (parallel [(set (match_dup 5)
> >                   (plus:DWIH
> >                     (plus:DWIH
> >
> > There is an earlyclobber on the output operand, so we are sure that 
> > assignments
> > to (op 0) and (op 5) won't clobber anything.
> > cprop_hardreg pass will then do the cleanup for us, resulting in:
> >
> > foo: movq    %rdi, %rax
> >        addq    %rdx, %rax
> >        movq    %rsi, %rdx
> >        adcq    $0, %rdx
> >
> > Uros.
>
> I agree.  This is a much better fix.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
> 2023-12-31  Uros Bizjak  <ubiz...@gmail.com>
>             Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/43644
>         * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): Tweak
>         order of instructions after split, to minimize number of moves.
>
> gcc/testsuite/ChangeLog
>         PR target/43644
>         * gcc.target/i386/pr43644-2.c: Expect 2 movq instructions.

OK.

Thanks, and Happy New Year to you, too!

Uros.

Reply via email to