On Wed, Jun 23, 2021 at 3:59 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Mon, Jun 21, 2021 at 10:08 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Mon, Jun 21, 2021 at 3:28 PM Uros Bizjak via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Mon, Jun 21, 2021 at 6:56 AM liuhongt <hongtao....@intel.com> wrote:
> > > >
> > > > The avx512 supports bitwise operations with mask registers, but the
> > > > throughput of those instructions is much lower than that of the
> > > > corresponding gpr version, so we would additionally disparages
> > > > slightly the mask register alternative for bitwise operations in the
> > > > LRA.
> > >
> > > This was the reason for UNSPEC tagged instructions with mask
> > > registers, used mainly for builtins.
> > >
> > > Also, care should be taken if we want mask registers to be used under
> > > GPR pressure, or it is better to spill GPR registers. In the past,
> > > DImode values caused a lot of problems with MMX registers on x86-64,
> > > but we were able to hand-tune the allocator in the way you propose.
> > >
> > > Let's try the proposed approach to see what happens.
> > >
> > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
> > > > MASK_REGS first since it has already been disparaged.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR target/101142
> > > >         * config/i386/i386.md: (*anddi_1): Disparage slightly the mask
> > > >         register alternative.
> > > >         (*and<mode>_1): Ditto.
> > > >         (*andqi_1): Ditto.
> > > >         (*andn<mode>_1): Ditto.
> > > >         (*<code><mode>_1): Ditto.
> > > >         (*<code>qi_1): Ditto.
> > > >         (*one_cmpl<mode>2_1): Ditto.
> > > >         (*one_cmplsi2_1_zext): Ditto.
> > > >         (*one_cmplqi2_1): Ditto.
> > > >         * config/i386/i386.c (x86_order_regs_for_local_alloc): Change
> > > >         the order of mask registers to be before general registers.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         PR target/101142
> > > >         * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > > >         * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > > >         * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > > >         * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
> > >
> > > OK with a comment addition, see inline.
> > >
> > > Thanks,
> > > Uros.
> > >
> > > > ---
> > > >  gcc/config/i386/i386.c                        |  8 +-
> > > >  gcc/config/i386/i386.md                       | 20 ++---
> > > >  .../gcc.target/i386/spill_to_mask-1.c         | 89 +++++++++++++------
> > > >  .../gcc.target/i386/spill_to_mask-2.c         | 11 ++-
> > > >  .../gcc.target/i386/spill_to_mask-3.c         | 11 ++-
> > > >  .../gcc.target/i386/spill_to_mask-4.c         | 11 ++-
> > > >  6 files changed, 91 insertions(+), 59 deletions(-)
> > > >
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index a61255857ff..a651853ca3b 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > > >     int pos = 0;
> > > >     int i;
> > > >
> > > > +   /* Mask register.  */
> > > > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > > > +     reg_alloc_order [pos++] = i;
> > >
> > > Please add a comment why mask registers should come first.
> > Thanks for the review, this is the patch i'm check in.
>
> This patch again caused unwanted mask instructions with -m32 in cpuid
> check code, e.g.:
>
> Running target unix/-m32
> FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test
> FAIL: gcc.target/i386/pr96814.c execution test
>
> Debugging pr96814 failure:
>
>   0x0804921d <+66>:    mov    %edx,%ecx
>   0x0804921f <+68>:    cpuid
> => 0x08049221 <+70>:    kmovd  %edx,%k0
>   0x08049225 <+74>:    mov    %eax,-0x8(%ebp)
>   0x08049228 <+77>:    mov    %ebx,-0xc(%ebp)
>
> It looks to me that putting mask registers in front of GPR is looking
So it's not functionality but performance issue here, under 32-bit
mode there are only 8 gprs which result in higher register pressure,
and for this we do have mask->integer and integer->mask cost, with
-mtune=bdver1 where cost of kmovd is quite high(16, 20 /*
mask->integer and integer->mask moves */), there's no mask
instructions in cpuid.
I guess we can adjust mask->integer and integer->mask for 32-bit mode
to avoid such a situation?
> for trouble, since apparently disparaging mask registers does not
> offset register preference of the new order.
Hmm, w/o order preference,it will fail bitwise_mask_op-3.c.(costs of
gpr and mask register are equal here) and according to gcc documents,
“?k" is minimal increase in cost of the mask register(one unit more
costly).
The cost model is always difficult in balancing these, as you said
fine tune cost model is always looking for trouble.
>
> Uros.



-- 
BR,
Hongtao

Reply via email to