On Wed, Jun 23, 2021 at 11:32 AM Hongtao Liu <crazy...@gmail.com> wrote:

> > > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, 
> > > > > > allocate
> > > > > > MASK_REGS first since it has already been disparaged.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         PR target/101142
> > > > > >         * config/i386/i386.md: (*anddi_1): Disparage slightly the 
> > > > > > mask
> > > > > >         register alternative.
> > > > > >         (*and<mode>_1): Ditto.
> > > > > >         (*andqi_1): Ditto.
> > > > > >         (*andn<mode>_1): Ditto.
> > > > > >         (*<code><mode>_1): Ditto.
> > > > > >         (*<code>qi_1): Ditto.
> > > > > >         (*one_cmpl<mode>2_1): Ditto.
> > > > > >         (*one_cmplsi2_1_zext): Ditto.
> > > > > >         (*one_cmplqi2_1): Ditto.
> > > > > >         * config/i386/i386.c (x86_order_regs_for_local_alloc): 
> > > > > > Change
> > > > > >         the order of mask registers to be before general registers.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         PR target/101142
> > > > > >         * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > > > > >         * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > > > > >         * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > > > > >         * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
> > > > >
> > > > > OK with a comment addition, see inline.
> > > > >
> > > > > Thanks,
> > > > > Uros.
> > > > >
> > > > > > ---
> > > > > >  gcc/config/i386/i386.c                        |  8 +-
> > > > > >  gcc/config/i386/i386.md                       | 20 ++---
> > > > > >  .../gcc.target/i386/spill_to_mask-1.c         | 89 
> > > > > > +++++++++++++------
> > > > > >  .../gcc.target/i386/spill_to_mask-2.c         | 11 ++-
> > > > > >  .../gcc.target/i386/spill_to_mask-3.c         | 11 ++-
> > > > > >  .../gcc.target/i386/spill_to_mask-4.c         | 11 ++-
> > > > > >  6 files changed, 91 insertions(+), 59 deletions(-)
> > > > > >
> > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > index a61255857ff..a651853ca3b 100644
> > > > > > --- a/gcc/config/i386/i386.c
> > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > > > > >     int pos = 0;
> > > > > >     int i;
> > > > > >
> > > > > > +   /* Mask register.  */
> > > > > > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > > > > > +     reg_alloc_order [pos++] = i;
> > > > >
> > > > > Please add a comment why mask registers should come first.
> > > > Thanks for the review, this is the patch i'm check in.
> > >
> > > This patch again caused unwanted mask instructions with -m32 in cpuid
> > > check code, e.g.:
> > >
> > > Running target unix/-m32
> > > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test
> > > FAIL: gcc.target/i386/pr96814.c execution test
> > >
> > > Debugging pr96814 failure:
> > >
> > >   0x0804921d <+66>:    mov    %edx,%ecx
> > >   0x0804921f <+68>:    cpuid
> > > => 0x08049221 <+70>:    kmovd  %edx,%k0
> > >   0x08049225 <+74>:    mov    %eax,-0x8(%ebp)
> > >   0x08049228 <+77>:    mov    %ebx,-0xc(%ebp)
> > >
> > > It looks to me that putting mask registers in front of GPR is looking
> > So it's not functionality but performance issue here, under 32-bit
> > mode there are only 8 gprs which result in higher register pressure,
> > and for this we do have mask->integer and integer->mask cost, with
> > -mtune=bdver1 where cost of kmovd is quite high(16, 20 /*
> > mask->integer and integer->mask moves */), there's no mask
> > instructions in cpuid.
> > I guess we can adjust mask->integer and integer->mask for 32-bit mode
> > to avoid such a situation?
> I notice the default option is O0, with -O there's no mask instructions.
> IMHO, We don't need to change cost unless there's -O2 cases where mask
> instructions regress performance here.

No, this reasoning is not acceptable. The compiled code will SIGILL on
targets where unsupported mask registers are involved, so GPR should
always have priority compared to mask registers. Based on these
findings, x86_order_regs_for_local_alloc change should be reverted,
and register move costs compensated elsewhere.

Uros.

Reply via email to