On Sun, Dec 31, 2023 at 4:56 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > > > From: Uros Bizjak <ubiz...@gmail.com> > > Sent: 28 December 2023 10:33 > > On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > > > > > This patch resolves the failure of pr43644-2.c in the testsuite, a > > > code quality test I added back in July, that started failing as the > > > code GCC generates for 128-bit values (and their parameter passing) > > > has been in flux. After a few attempts at tweaking pattern > > > constraints in the hope of convincing reload to produce a more > > > aggressive (but potentially > > > unsafe) register allocation, I think the best solution is to use a > > > peephole2 to catch/clean-up this specific case. > > > > > > Specifically, the function: > > > > > > unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { > > > return x+y; > > > } > > > > > > currently generates: > > > > > > foo: movq %rdx, %rcx > > > movq %rdi, %rax > > > movq %rsi, %rdx > > > addq %rcx, %rax > > > adcq $0, %rdx > > > ret > > > > > > and with this patch/peephole2 now generates: > > > > > > foo: movq %rdx, %rax > > > movq %rsi, %rdx > > > addq %rdi, %rax > > > adcq $0, %rdx > > > ret > > > > > > which I believe is optimal. > > > > How about simply moving the assignment to the MSB in the split pattern > > after the > > LSB calculation: > > > > [(set (match_dup 0) (match_dup 4)) > > - (set (match_dup 5) (match_dup 2)) > > (parallel [(set (reg:CCC FLAGS_REG) > > (compare:CCC > > (plus:DWIH (match_dup 0) (match_dup 1)) > > (match_dup 0))) > > (set (match_dup 0) > > (plus:DWIH (match_dup 0) (match_dup 1)))]) > > + (set (match_dup 5) (match_dup 2)) > > (parallel [(set (match_dup 5) > > (plus:DWIH > > (plus:DWIH > > > > There is an earlyclobber on the output operand, so we are sure that > > assignments > > to (op 0) and (op 5) won't clobber anything. > > cprop_hardreg pass will then do the cleanup for us, resulting in: > > > > foo: movq %rdi, %rax > > addq %rdx, %rax > > movq %rsi, %rdx > > adcq $0, %rdx > > > > Uros. > > I agree. This is a much better fix. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > 2023-12-31 Uros Bizjak <ubiz...@gmail.com> > Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/43644 > * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): Tweak > order of instructions after split, to minimize number of moves. > > gcc/testsuite/ChangeLog > PR target/43644 > * gcc.target/i386/pr43644-2.c: Expect 2 movq instructions.
OK. Thanks, and Happy New Year to you, too! Uros.