Hi Uros, > From: Uros Bizjak <ubiz...@gmail.com> > Sent: 28 December 2023 10:33 > On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle <ro...@nextmovesoftware.com> > wrote: > > > > This patch resolves the failure of pr43644-2.c in the testsuite, a > > code quality test I added back in July, that started failing as the > > code GCC generates for 128-bit values (and their parameter passing) > > has been in flux. After a few attempts at tweaking pattern > > constraints in the hope of convincing reload to produce a more > > aggressive (but potentially > > unsafe) register allocation, I think the best solution is to use a > > peephole2 to catch/clean-up this specific case. > > > > Specifically, the function: > > > > unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { > > return x+y; > > } > > > > currently generates: > > > > foo: movq %rdx, %rcx > > movq %rdi, %rax > > movq %rsi, %rdx > > addq %rcx, %rax > > adcq $0, %rdx > > ret > > > > and with this patch/peephole2 now generates: > > > > foo: movq %rdx, %rax > > movq %rsi, %rdx > > addq %rdi, %rax > > adcq $0, %rdx > > ret > > > > which I believe is optimal. > > How about simply moving the assignment to the MSB in the split pattern after > the > LSB calculation: > > [(set (match_dup 0) (match_dup 4)) > - (set (match_dup 5) (match_dup 2)) > (parallel [(set (reg:CCC FLAGS_REG) > (compare:CCC > (plus:DWIH (match_dup 0) (match_dup 1)) > (match_dup 0))) > (set (match_dup 0) > (plus:DWIH (match_dup 0) (match_dup 1)))]) > + (set (match_dup 5) (match_dup 2)) > (parallel [(set (match_dup 5) > (plus:DWIH > (plus:DWIH > > There is an earlyclobber on the output operand, so we are sure that > assignments > to (op 0) and (op 5) won't clobber anything. > cprop_hardreg pass will then do the cleanup for us, resulting in: > > foo: movq %rdi, %rax > addq %rdx, %rax > movq %rsi, %rdx > adcq $0, %rdx > > Uros.
I agree. This is a much better fix. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-12-31 Uros Bizjak <ubiz...@gmail.com> Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog PR target/43644 * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): Tweak order of instructions after split, to minimize number of moves. gcc/testsuite/ChangeLog PR target/43644 * gcc.target/i386/pr43644-2.c: Expect 2 movq instructions. Thanks again (and Happy New Year). Roger --
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e862368..6671274 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -6412,13 +6412,13 @@ "#" "&& reload_completed" [(set (match_dup 0) (match_dup 4)) - (set (match_dup 5) (match_dup 2)) (parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 0) (match_dup 1)) (match_dup 0))) (set (match_dup 0) (plus:DWIH (match_dup 0) (match_dup 1)))]) + (set (match_dup 5) (match_dup 2)) (parallel [(set (match_dup 5) (plus:DWIH (plus:DWIH diff --git a/gcc/testsuite/gcc.target/i386/pr43644-2.c b/gcc/testsuite/gcc.target/i386/pr43644-2.c index d470b0a..3316ac6 100644 --- a/gcc/testsuite/gcc.target/i386/pr43644-2.c +++ b/gcc/testsuite/gcc.target/i386/pr43644-2.c @@ -6,4 +6,4 @@ unsigned __int128 foo(unsigned __int128 x, unsigned long long y) return x+y; } -/* { dg-final { scan-assembler-times "movq" 1 } } */ +/* { dg-final { scan-assembler-times "movq" 2 } } */