On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > > > From: Uros Bizjak <ubiz...@gmail.com> > > Sent: 01 November 2023 10:05 > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation > > using peephole2. > > > > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > > > > > > > > This patch is a follow-up to my previous PR target/110551 patch, this > > > time to address the additional move after mulx, seen on TARGET_BMI2 > > > architectures (such as -march=haswell). The complication here is that > > > the flexible multiple-set mulx instruction is introduced into RTL > > > after reload, by split2, and therefore can't benefit from register > > > preferencing. This results in RTL like the following: > > > > > > (insn 32 31 17 2 (parallel [ > > > (set (reg:DI 4 si [orig:101 r ] [101]) > > > (mult:DI (reg:DI 1 dx [109]) > > > (reg:DI 5 di [109]))) > > > (set (reg:DI 5 di [ r+8 ]) > > > (umul_highpart:DI (reg:DI 1 dx [109]) > > > (reg:DI 5 di [109]))) > > > ]) "pr110551-2.c":8:17 -1 > > > (nil)) > > > > > > (insn 17 32 9 2 (set (reg:DI 0 ax [107]) > > > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal} > > > (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ]) > > > (nil))) > > > > > > Here insn 32, the mulx instruction, places its results in si and di, > > > and then immediately after decides to move di to ax, with di now dead. > > > This can be trivially cleaned up by a peephole2. I've added an > > > additional constraint that the two SET_DESTs can't be the same > > > register to avoid confusing the middle-end, but this has well-defined > > > behaviour on x86_64/BMI2, encoding a umul_highpart. > > > > > > For the new test case, compiled on x86_64 with -O2 -march=haswell: > > > > > > Before: > > > mulx64: movabsq $-7046029254386353131, %rdx > > > mulx %rdi, %rsi, %rdi > > > movq %rdi, %rax > > > xorq %rsi, %rax > > > ret > > > > > > After: > > > mulx64: movabsq $-7046029254386353131, %rdx > > > mulx %rdi, %rsi, %rax > > > xorq %rsi, %rax > > > ret > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=unix{-m32} > > > with no new failures. Ok for mainline? > > > > It looks that your previous PR110551 patch regressed -march=cascadelake [1]. > > Let's fix these regressions first. > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html > > > > Uros. > > This patch fixes that "regression". Originally, the test case in PR110551 > contained > one unnecessary mov on "default" x86_targets, but two extra movs on BMI2 > targets, including -march=haswell and -march=cascadelake. The first patch > eliminated one of these MOVs, this patch eliminates the second. I'm not sure > that you can call it a regression, the added test failed when run with a > non-standard > -march setting. The good news is that test case doesn't have to be changed > with > this patch applied, i.e. the correct intended behaviour is no MOVs on all > architectures.
I was not worried about the extra mov, but more about [2], also referred from [1], but it looks that that regression was wrongly attributed to your patch. [2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html > I'll admit the timing is unusual; I had already written and was regression > testing a > patch for the BMI2 issue, when the -march=cascadelake regression tester let me > know it was required for folks that helpfully run the regression suite with > non > standard settings. i.e. a long standing bug that wasn't previously tested > for by > the testsuite. > > > > 2023-10-30 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > PR target/110551 > > > * config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition > > > as operands[2] with predicate register_operand must be !MEM_P. > > > (peephole2): Optimize a mulx followed by a register-to-register > > > move, to place result in the correct destination if possible. > > > > > > gcc/testsuite/ChangeLog > > > PR target/110551 > > > * gcc.target/i386/pr110551-2.c: New test case. The patch is OK. Thanks, Uros.