RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-17 Thread Roger Sayle


> From: Jiang, Haochen 
> Sent: 17 July 2023 02:50
> 
> > From: Jiang, Haochen
> > Sent: Friday, July 14, 2023 10:50 AM
> >
> > > The recent change in TImode parameter passing on x86_64 results in
> > > the FAIL of pr91681-1.c.  The issue is that with the extra
> > > flexibility, the combine pass is now spoilt for choice between using
> > > either the *add3_doubleword_concat or the
> > > *add3_doubleword_zext patterns, when one operand is a *concat and
> the other is a zero_extend.
> > > The solution proposed below is provide an
> > > *add3_doubleword_concat_zext define_insn_and_split, that can
> > > benefit both from the register allocation of *concat, and still
> > > avoid the xor normally required by zero extension.
> > >
> > > I'm investigating a follow-up refinement to improve register
> > > allocation further by avoiding the early clobber in the =, and
> > > handling (custom) reloads explicitly, but this piece resolves the
> > > testcase
> > failure.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > bootstrap and make -k check, both with and without
> > > --target_board=unix{-m32} with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-11  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR target/91681
> > > * config/i386/i386.md (*add3_doubleword_concat_zext): New
> > > define_insn_and_split derived from
*add3_doubleword_concat
> > > and *add3_doubleword_zext.
> >
> > Hi Roger,
> >
> > This commit currently changed the codegen of testcase p443644-2.c from:
> 
> Oops, a typo, I mean pr43644-2.c.
> 
> Haochen

I'm working on a fix and hope to have this resolved soon (unfortunately
fixing
things in a post-reload splitter isn't working out due to reload's choices,
so the
solution will likely be a peephole2).

The problem is that pr91681-1.c and pr43644-2.c can't both PASS (as
written)!
The operation x = y + 0, can be generated as either "mov y,x; add $0,x" or
as
"xor x,x; add y,x".  pr91681-1.c checks there isn't an xor, pr43644-2.c
checks
there isn't a mov.  Doh!  As the author of both these test cases, I've
painted
myself into a corner.

The solution is that add $0,x should be generated (optimal) when y is
already in x,
and "xor x,x; add y,x" used otherwise (as this is shorter than "mov y,x; add
$0,x",
both sequences being approximately equal performance-wise).

> > movq%rdx, %rax
> > xorl%edx, %edx
> > addq%rdi, %rax
> > adcq%rsi, %rdx
> > to:
> > movq%rdx, %rcx
> > movq%rdi, %rax
> > movq%rsi, %rdx
> > addq%rcx, %rax
> > adcq$0, %rdx
> >
> > which causes the testcase fail under -m64.
> > Is this within your expectation?

You're right that the original (using xor) is better for pr43644-2.c's test
case.
unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return
x+y; }
but the closely related (swapping the argument order):
unsigned __int128 bar(unsigned long long y, unsigned __int128 x) { return
x+y; }
is better using "adcq $0", than having a superfluous xor.

Executive summary: This FAIL isn't serious.  I'll silence it soon.

> > BRs,
> > Haochen
> >
> > >
> > >
> > > Thanks,
> > > Roger
> > > --




RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-16 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jiang, Haochen
> Sent: Friday, July 14, 2023 10:50 AM
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' 
> Subject: RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c
> 
> > The recent change in TImode parameter passing on x86_64 results in the
> > FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> > the combine pass is now spoilt for choice between using either the
> > *add3_doubleword_concat or the *add3_doubleword_zext
> > patterns, when one operand is a *concat and the other is a zero_extend.
> > The solution proposed below is provide an
> > *add3_doubleword_concat_zext define_insn_and_split, that can
> > benefit both from the register allocation of *concat, and still avoid
> > the xor normally required by zero extension.
> >
> > I'm investigating a follow-up refinement to improve register
> > allocation further by avoiding the early clobber in the =, and
> > handling (custom) reloads explicitly, but this piece resolves the testcase
> failure.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-11  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/91681
> > * config/i386/i386.md (*add3_doubleword_concat_zext): New
> > define_insn_and_split derived from *add3_doubleword_concat
> > and *add3_doubleword_zext.
> 
> Hi Roger,
> 
> This commit currently changed the codegen of testcase p443644-2.c from:

Oops, a typo, I mean pr43644-2.c.

Haochen

> 
> movq%rdx, %rax
> xorl%edx, %edx
> addq%rdi, %rax
> adcq%rsi, %rdx
> to:
> 
> movq%rdx, %rcx
> movq%rdi, %rax
> movq%rsi, %rdx
> addq%rcx, %rax
> adcq$0, %rdx
> 
> which causes the testcase fail under -m64.
> 
> Is this within your expectation?
> 
> BRs,
> Haochen
> 
> >
> >
> > Thanks,
> > Roger
> > --



RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-13 Thread Jiang, Haochen via Gcc-patches
> The recent change in TImode parameter passing on x86_64 results in the FAIL
> of pr91681-1.c.  The issue is that with the extra flexibility, the combine 
> pass is
> now spoilt for choice between using either the
> *add3_doubleword_concat or the *add3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an
> *add3_doubleword_concat_zext define_insn_and_split, that can
> benefit both from the register allocation of *concat, and still avoid the xor
> normally required by zero extension.
> 
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures.  Ok for mainline?
> 
> 
> 2023-07-11  Roger Sayle  
> 
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add3_doubleword_concat_zext): New
> define_insn_and_split derived from *add3_doubleword_concat
> and *add3_doubleword_zext.

Hi Roger,

This commit currently changed the codegen of testcase p443644-2.c from:

movq%rdx, %rax
xorl%edx, %edx
addq%rdi, %rax
adcq%rsi, %rdx
to:

movq%rdx, %rcx
movq%rdi, %rax
movq%rsi, %rdx
addq%rcx, %rax
adcq$0, %rdx

which causes the testcase fail under -m64.

Is this within your expectation?

BRs,
Haochen

> 
> 
> Thanks,
> Roger
> --



Re: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 11, 2023 at 10:07 PM Roger Sayle  wrote:
>
>
> The recent change in TImode parameter passing on x86_64 results in the
> FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> the combine pass is now spoilt for choice between using either the
> *add3_doubleword_concat or the *add3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an *add3_doubleword_concat_zext
> define_insn_and_split, that can benefit both from the register allocation
> of *concat, and still avoid the xor normally required by zero extension.
>
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-07-11  Roger Sayle  
>
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add3_doubleword_concat_zext): New
> define_insn_and_split derived from *add3_doubleword_concat
> and *add3_doubleword_zext.

OK.

Thanks,
Uros.

>
>
> Thanks,
> Roger
> --
>