Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

James Greenhalgh Tue, 02 Feb 2016 02:03:18 -0800

On Tue, Jan 26, 2016 at 05:39:24PM +0000, Wilco Dijkstra wrote:
> ping (note the regressions discussed below are addressed by 
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01761.html)


OK, but please be extra vigilant for any fallout on AArch64 after this
and the follow-up linked above is applied.

Thanks,
James

> James Greenhalgh wrote:
> > On Wed, Dec 16, 2015 at 01:05:21PM +0000, Wilco Dijkstra wrote:
> > > James Greenhalgh wrote:
> > > > On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote:
> > > > > ping
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > > > > Sent: 06 November 2015 20:06
> > > > > > To: 'gcc-patches@gcc.gnu.org'
> > > > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > > >
> > > > > > This patch adds support for the 
> > > > > > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the 
> > > > > > register
> > > > > > allocator always uses ALL_REGS even when it has a much higher cost. 
> > > > > > The
> > > > > > hook changes the class to either FP_REGS or GENERAL_REGS depending 
> > > > > > on the
> > > > > > mode of the register. This results in better register allocation 
> > > > > > overall,
> > > > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > > > > >
> > > > > > GCC regression passes with several minor fixes.
> > > > > >
> > > > > > OK for commit?
> > > > > >
> > > > > > ChangeLog:
> > > > > > 2015-11-06  Wilco Dijkstra  <wdijk...@arm.com>
> > > > > >
> > > > > >       * gcc/config/aarch64/aarch64.c
> > > > > >       (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > > > >       (aarch64_ira_change_pseudo_allocno_class): New function.
> > > > > >       * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > > > >       * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > >       (test_corners_sisd_di): Improve force to SIMD register.
> > > > > >       (test_corners_sisd_si): Likewise.
> > > > > >       * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with 
> > > > > > -O2.
> > > > > >       * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > > > >       Remove scan-assembler check for ldr.
> > > >
> > > > Drop the gcc/ from the ChangeLog.
> > > >
> > > > > > --
> > > > > >  gcc/config/aarch64/aarch64.c                       | 22 
> > > > > > ++++++++++++++++++++++
> > > > > >  gcc/testsuite/gcc.target/aarch64/cvtf_1.c          |  2 +-
> > > > > >  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
> > > > > >  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c     |  2 +-
> > > > > >  .../gcc.target/aarch64/vect-ld1r-compile-fp.c      |  1 -
> > > >
> > > > These testsuite changes concern me a bit, and you don't mention them 
> > > > beyond
> > > > saying they are minor fixes...
> > >
> > > Well any changes to register allocator preferencing would cause fallout in
> > > tests that are assuming which register is allocated, especially if they 
> > > use
> > > nasty inline assembler hacks to do so...
> >
> > Sure, but the testcases here each operate on data that should live in
> > FP_REGS given the initial conditions that the nasty hacks try to mimic -
> > that's what makes the regressions notable.
> >
> > >
> > > > > >  #define FCVTDEF(ftype,itype) \
> > > > > >  void \
> > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> > > > > > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > > index 363f554..8465c89 100644
> > > > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > > > >  {
> > > > > >    force_simd_di (b);
> > > > > >    b = b >> 63;
> > > > > > +  force_simd_di (b);
> > > > > >    b = b >> 0;
> > > > > >    b += b >> 65; /* { dg-warning "right shift count >= width of 
> > > > > > type" } */
> > > > > > -  force_simd_di (b);
> > > >
> > > > This one I don't understand, but seems to say that we've decided to move
> > > > b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> > > > another register allocator regression?
> > >
> > > No, basically the register allocator is now making better decisions as to
> > > where to allocate integer variables. It will only allocate them to FP
> > > registers if they are primarily used by other FP operations. The
> > > force_simd_di inline assembler tries to mimic FP uses, and if there are
> > > enough of them at the right places then everything works as expected.  If
> > > however you do 3 consecutive integer operations then the allocator will 
> > > now
> > > correctly prefer to allocate them to the integer registers (while 
> > > previously
> > > it wouldn't, which is inefficient).
> >
> > I'm not sure I understand this argument in the abstract (though I believe
> > it for some of the supported cores for the AArch64 target). At an abstract
> > level, given a set of operations which can execute in either FP_REGS or
> > GENERAL_REGS and initial and post conditions that allocate all input and
> > output registers from those operations to FP_REGS, I would expect those
> > operations to take place using FP_REGS? Your patch seems to break this
> > expectation?
> 
> No my patch doesn't break that expectation. The goal is that if the cost of
> allocating to either integer or FP registers is the same, we prefer the most
> natural register file based on the type. We'll continue to allocate integer
> operations to FP_REGS if that has the lowest cost.
> 
> Like I mentioned in the explanation, the issue is that the register allocator 
> simply
> ignores the the much higher cost of ALL_REGS and uses it eventhough it 
> results in
> very suboptimal allocations and a large number of redundant int<->fp moves.
> This patch fixes this by forcing the preference to FP_REGS or GENERAL_REGS if 
> it
> Is ALL_REGS.
> 
> > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c 
> > > > > > b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > > > index a49db3e..c5a9c52 100644
> > > > > > --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > > > @@ -1,6 +1,6 @@
> > > > > >  /* Test vdup_lane intrinsics work correctly.  */
> > > > > >  /* { dg-do run } */
> > > > > > -/* { dg-options "-O1 --save-temps" } */
> > > > > > +/* { dg-options "-O2 --save-temps" } */
> > > >
> > > > Another -O1 regression ?
> > >
> > > No, it's triggering a bug in the -O1 register preferencing that causes 
> > > incorrect preferences to be
> > > selected despite the costs being right. The cost calculation with -O1 for 
> > > eg.
> > > wrap_vdupb_lane_s8_0() in vdup_lane_2.c:
> > >
> > > Pass 0 for finding pseudo/allocno costs
> > >
> > >     r79: preferred FP_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
> > >     a1 (r79,l0) best GENERAL_REGS, allocno GENERAL_REGS
> > >     r78: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
> > >     a0 (r78,l0) best GENERAL_REGS, allocno GENERAL_REGS
> > >
> > >   a0(r78,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 
> > > FP_LO_REGS:5000,5000 FP_REGS:5000,5000
> > ALL_REGS:10000,10000 MEM:9000,9000
> > >   a1(r79,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 
> > > FP_LO_REGS:0,0 FP_REGS:0,0 ALL_REGS:10000,10000
> > MEM:9000,9000
> > >
> > > So it correctly prefers FP_REGS for r79 as it has the lowest cost, but 
> > > then
> > > forces the allocno and best register to GENERAL_REGS... We could work 
> > > around
> > > it by not having the "r" variant first in the aarch64_get_lane patterns 
> > > and
> > > further discouraging its use via "?r", but that's a different patch.
> >
> > Well, that patch (moving "r" alternative away from first) does seem to
> > better fit with what we've done elsewhere in aarch64-simd.md (e.g.
> > aarch64_combinez below). Does making this change obviate the need to
> > change these testcases to -O1? If so, I'd rather break them with your patch
> > and fix it in a follow-up than paper over the cracks.
> 
> Yes, using "?r" works. I can easily add this to my combinez patch - the issue 
> is that there are a
> lot more patterns that have the same problem, so we also need a fix in the 
> register allocator
> (we need to do both as reload also has bugs where it completely ignores all 
> the costs and
> preferences, so the order really matters a lot...).
> 
> So I looked a bit further, and the bug is that the preferencing also forces 
> ALL_REGS if the
> GENERAL_REGS and FP_REGS costs are not equal but both are lower than the 
> memory cost
> (again even if ALL_REGS cost is higher than the memory cost!).
> 
> In that case TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS will force the preference
> irrespectively of the best preference. To fix this we need to extend it with 
> the best register
> class (and possibly alternate class) so we can avoid forcing the wrong 
> preference if there already
> is a good preference (ie. not ALL_REGS). I'll write a patch for that - it's 
> trivial but presumably too
> late for GCC6 as it affects a target callback...
> 
> Wilco

Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

Reply via email to