Re: [PING 3][PATCH] [v4][aarch64] Avoid tag collisions for loads falkor

2018-08-07 Thread Siddhesh Poyarekar

On 08/07/2018 10:30 PM, James Greenhalgh wrote:

To help set expectations here. I'm currently only able to dedicate a couple
of hours of time to review each week. Tamar's Stack Clash has been taking
a big chunk of that time recently as we push it to a final state for trunk.

This pass is... large and complex. I'm aware that it needs some attention,
and will try to get to it within the coming weeks.


Thank you, I totally understand.  I'll keep pinging on a weekly basis 
though if that's OK; it helps my workflow because otherwise I might 
forget about the patch for weeks at a time.



All the review from Kyrill in the interim is extremely helpful.


It definitely is.

Thanks,
Siddhesh


Re: [PING 3][PATCH] [v4][aarch64] Avoid tag collisions for loads falkor

2018-08-07 Thread James Greenhalgh
On Tue, Aug 07, 2018 at 03:01:28AM -0500, Siddhesh Poyarekar wrote:
> Hello,
> 
> Ping!

To help set expectations here. I'm currently only able to dedicate a couple
of hours of time to review each week. Tamar's Stack Clash has been taking
a big chunk of that time recently as we push it to a final state for trunk.

This pass is... large and complex. I'm aware that it needs some attention,
and will try to get to it within the coming weeks.

All the review from Kyrill in the interim is extremely helpful.

Thanks,
James

> 
> Siddhesh
> 
> On 07/24/2018 12:37 PM, Siddhesh Poyarekar wrote:
> > Hi,
> > 
> > This is a rewrite of the tag collision avoidance patch that Kugan had
> > written as a machine reorg pass back in February.
> > 
> > The falkor hardware prefetching system uses a combination of the
> > source, destination and offset to decide which prefetcher unit to
> > train with the load.  This is great when loads in a loop are
> > sequential but sub-optimal if there are unrelated loads in a loop that
> > tag to the same prefetcher unit.
> > 
> > This pass attempts to rename the desination register of such colliding
> > loads using routines available in regrename.c so that their tags do
> > not collide.  This shows some performance gains with mcf and xalancbmk
> > (~5% each) and will be tweaked further.  The pass is placed near the
> > fag end of the pass list so that subsequent passes don't inadvertantly
> > end up undoing the renames.
> > 
> > A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
> > did not introduce any new regressions.  I also did a make-check with
> > -mcpu=falkor to ensure that there were no regressions.  The couple of
> > regressions I found were target-specific and were related to scheduling
> > and cost differences and are not correctness issues.
> > 
> > Changes from v3:
> > - Avoid renaming argument/return registers and registers that have a
> >specific architectural meaning, i.e. stack pointer, frame pointer,
> >etc.  Try renaming their aliases instead.
> > 
> > Changes from v2:
> > - Ignore SVE instead of asserting that falkor does not support sve
> > 
> > Changes from v1:
> > 
> > - Fixed up issues pointed out by Kyrill
> > - Avoid renaming R0/V0 since they could be return values
> > - Fixed minor formatting issues.
> > 
> > 2018-07-02  Siddhesh Poyarekar  
> > Kugan Vivekanandarajah  
> > 
> > * config/aarch64/falkor-tag-collision-avoidance.c: New file.
> > * config.gcc (extra_objs): Build it.
> > * config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
> > Likewise.
> > * config/aarch64/aarch64-passes.def
> > (pass_tag_collision_avoidance): New pass.
> > * config/aarch64/aarch64.c (qdf24xx_tunings): Add
> > AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
> > (aarch64_classify_address): Remove static qualifier.
> > (aarch64_address_info, aarch64_address_type): Move to...
> > * config/aarch64/aarch64-protos.h: ... here.
> > (make_pass_tag_collision_avoidance): New function.
> > * config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
> > New tuning flag.
> > 
> > CC: james.greenha...@arm.com
> > CC: kyrylo.tkac...@foss.arm.com
> > ---
> >   gcc/config.gcc|   2 +-
> >   gcc/config/aarch64/aarch64-passes.def |   1 +
> >   gcc/config/aarch64/aarch64-protos.h   |  49 +
> >   gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
> >   gcc/config/aarch64/aarch64.c  |  48 +-
> >   .../aarch64/falkor-tag-collision-avoidance.c  | 881 ++
> >   gcc/config/aarch64/t-aarch64  |   9 +
> >   7 files changed, 946 insertions(+), 46 deletions(-)
> >   create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > 
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 78e84c2b864..8f5e458e8a6 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -304,7 +304,7 @@ aarch64*-*-*)
> > extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
> > c_target_objs="aarch64-c.o"
> > cxx_target_objs="aarch64-c.o"
> > -   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
> > +   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
> > falkor-tag-collision-avoidance.o"
> > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
> > target_has_targetm_common=yes
> > ;;
> > diff --git a/gcc/config/aarch64/aarch64-passes.def 
> > b/gcc/config/aarch64/aarch64-passes.def
> > index 87747b420b0..f61a8870aa1 100644
> > --- a/gcc/config/aarch64/aarch64-passes.def
> > +++ b/gcc/config/aarch64/aarch64-passes.def
> > @@ -19,3 +19,4 @@
> >  .  */
> >   
> >   INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
> > +INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
> > b/gcc/config/aarch64/aarch64-protos.h
> > index