On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak <ubiz...@gmail.com>
> > Sent: 01 November 2023 10:05
> > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> > using peephole2.
> >
> > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle <ro...@nextmovesoftware.com>
> > wrote:
> > >
> > >
> > > This patch is a follow-up to my previous PR target/110551 patch, this
> > > time to address the additional move after mulx, seen on TARGET_BMI2
> > > architectures (such as -march=haswell).  The complication here is that
> > > the flexible multiple-set mulx instruction is introduced into RTL
> > > after reload, by split2, and therefore can't benefit from register
> > > preferencing.  This results in RTL like the following:
> > >
> > > (insn 32 31 17 2 (parallel [
> > >             (set (reg:DI 4 si [orig:101 r ] [101])
> > >                 (mult:DI (reg:DI 1 dx [109])
> > >                     (reg:DI 5 di [109])))
> > >             (set (reg:DI 5 di [ r+8 ])
> > >                 (umul_highpart:DI (reg:DI 1 dx [109])
> > >                     (reg:DI 5 di [109])))
> > >         ]) "pr110551-2.c":8:17 -1
> > >      (nil))
> > >
> > > (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> > >         (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
> > >      (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> > >         (nil)))
> > >
> > > Here insn 32, the mulx instruction, places its results in si and di,
> > > and then immediately after decides to move di to ax, with di now dead.
> > > This can be trivially cleaned up by a peephole2.  I've added an
> > > additional constraint that the two SET_DESTs can't be the same
> > > register to avoid confusing the middle-end, but this has well-defined
> > > behaviour on x86_64/BMI2, encoding a umul_highpart.
> > >
> > > For the new test case, compiled on x86_64 with -O2 -march=haswell:
> > >
> > > Before:
> > > mulx64: movabsq $-7046029254386353131, %rdx
> > >         mulx    %rdi, %rsi, %rdi
> > >         movq    %rdi, %rax
> > >         xorq    %rsi, %rax
> > >         ret
> > >
> > > After:
> > > mulx64: movabsq $-7046029254386353131, %rdx
> > >         mulx    %rdi, %rsi, %rax
> > >         xorq    %rsi, %rax
> > >         ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> >
> > It looks that your previous PR110551 patch regressed -march=cascadelake [1].
> > Let's fix these regressions first.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
> >
> > Uros.
>
> This patch fixes that "regression".  Originally, the test case in PR110551 
> contained
> one unnecessary mov on "default" x86_targets, but two extra movs on BMI2
> targets, including -march=haswell and -march=cascadelake.  The first patch
> eliminated one of these MOVs, this patch eliminates the second.  I'm not sure
> that you can call it a regression, the added test failed when run with a 
> non-standard
> -march setting.  The good news is that test case doesn't have to be changed 
> with
> this patch applied, i.e. the correct intended behaviour is no MOVs on all 
> architectures.

I was not worried about the extra mov, but more about [2], also
referred from [1], but it looks that that regression was wrongly
attributed to your patch.

[2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html

> I'll admit the timing is unusual; I had already written and was regression 
> testing a
> patch for the BMI2 issue, when the -march=cascadelake regression tester let me
> know it was required for folks that helpfully run the regression suite with 
> non
> standard settings.  i.e. a long standing bug that wasn't previously tested 
> for by
> the testsuite.
>
> > > 2023-10-30  Roger Sayle  <ro...@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR target/110551
> > >         * config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition
> > >         as operands[2] with predicate register_operand must be !MEM_P.
> > >         (peephole2): Optimize a mulx followed by a register-to-register
> > >         move, to place result in the correct destination if possible.
> > >
> > > gcc/testsuite/ChangeLog
> > >         PR target/110551
> > >         * gcc.target/i386/pr110551-2.c: New test case.

The patch is OK.

Thanks,
Uros.

Reply via email to