Re: Ping: [PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-11-13 Thread Kelvin Nilsen



On 10/25/19 8:30 PM, Kelvin Nilsen wrote:
> 
> This patch adds a new optimization pass for rs6000 targets.
> 
> This new pass scans existing rtl expressions and replaces X-form loads and 
> stores with rtl expressions that favor selection of the D-form instructions 
> in contexts for which the D-form instructions are preferred.  The new pass 
> runs after the RTL loop optimizations since loop unrolling often introduces 
> opportunities for beneficial replacements of X-form addressing instructions.
> 
> For each of the new tests, multiple X-form instructions are replaced with 
> D-form instructions, some addi instructions are replaced with add 
> instructions, and some addi instructions are eliminated.  The typical 
> improvement for the included tests is a decrease of 4.28% to 12.12% in the 
> number of instructions executed on each iteration of the loop.  The 
> optimization has not shown measurable improvement on specmark tests, 
> presumably because the typical loops that are benefited by this optimization 
> are memory bounded and this optimization does not eliminate memory loads or 
> stores.  However, it is anticipated that multi-threaded workloads and 
> measurements of total power and cooling costs for heavy server workloads 
> would benefit.
> 
> This version 4 patch responds to feedback and numerous suggestions by Segher:
> 
>   1. Further improvements to comments and discussion of computational 
> complexity.
> 
>   2. Changed the name of insn_sequence_no to luid.
> 
>   3. Fixed some typos in comments.
> 
>   4. Added macro-defined constants to enforce upper bounds on the sizes (and 
> number of required iterations) for certain data structures.  The intent is to 
> bound compile time for programs that represent large numbers of opportunities 
> for D-form replacements.  This optimization pass ignores  parts of a source 
> program that exceed these macro-defined size limits.
> 
> In a separate mail, I have sent discussion regarding the behavior of 
> preceding passes and how this behavior relates to this new pass.
> 
> I have built and regression tested this patch on powerpc64le-unknown-linux 
> target with no regressions.
> 
> Is this ok for trunk?
> 
> gcc/ChangeLog:
> 
> 2019-10-25  Kelvin Nilsen  
> 
>   * config/rs6000/rs6000-p9dform.c: New file.
>   * config/rs6000/rs6000-passes.def: Add pass_insert_dform.
>   * config/rs6000/rs6000-protos.h
>   (rs6000_target_supports_dform_offset_p): New function prototype.
>   (make_pass_insert_dform): Likewise.
>   * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
>   New function.
>   * config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
>   * config.gcc: Add rs6000-p9dform.o object file.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-25  Kelvin Nilsen  
> 
>   * gcc.target/powerpc/p9-dform-0.c: New test.
>   * gcc.target/powerpc/p9-dform-1.c: New test.
>   * gcc.target/powerpc/p9-dform-10.c: New test.
>   * gcc.target/powerpc/p9-dform-11.c: New test.
>   * gcc.target/powerpc/p9-dform-12.c: New test.
>   * gcc.target/powerpc/p9-dform-13.c: New test.
>   * gcc.target/powerpc/p9-dform-14.c: New test.
>   * gcc.target/powerpc/p9-dform-15.c: New test.
>   * gcc.target/powerpc/p9-dform-2.c: New test.
>   * gcc.target/powerpc/p9-dform-3.c: New test.
>   * gcc.target/powerpc/p9-dform-4.c: New test.
>   * gcc.target/powerpc/p9-dform-5.c: New test.
>   * gcc.target/powerpc/p9-dform-6.c: New test.
>   * gcc.target/powerpc/p9-dform-7.c: New test.
>   * gcc.target/powerpc/p9-dform-8.c: New test.
>   * gcc.target/powerpc/p9-dform-9.c: New test.
>   * gcc.target/powerpc/p9-dform-generic.h: New test.
> 
> Index: gcc/config/rs6000/rs6000-p9dform.c
> ===
> --- gcc/config/rs6000/rs6000-p9dform.c(nonexistent)
> +++ gcc/config/rs6000/rs6000-p9dform.c(working copy)
> @@ -0,0 +1,1763 @@
> +/* Subroutines used to transform array subscripting expressions into
> +   forms that are more amenable to d-form instruction selection for p9
> +   little-endian VSX code.
> +   Copyright (C) 1991-2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +#include "config.h"
> +#include "system.h"
> 

[PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-10-25 Thread Kelvin Nilsen


This patch adds a new optimization pass for rs6000 targets.

This new pass scans existing rtl expressions and replaces X-form loads and 
stores with rtl expressions that favor selection of the D-form instructions in 
contexts for which the D-form instructions are preferred.  The new pass runs 
after the RTL loop optimizations since loop unrolling often introduces 
opportunities for beneficial replacements of X-form addressing instructions.

For each of the new tests, multiple X-form instructions are replaced with 
D-form instructions, some addi instructions are replaced with add instructions, 
and some addi instructions are eliminated.  The typical improvement for the 
included tests is a decrease of 4.28% to 12.12% in the number of instructions 
executed on each iteration of the loop.  The optimization has not shown 
measurable improvement on specmark tests, presumably because the typical loops 
that are benefited by this optimization are memory bounded and this 
optimization does not eliminate memory loads or stores.  However, it is 
anticipated that multi-threaded workloads and measurements of total power and 
cooling costs for heavy server workloads would benefit.

This version 4 patch responds to feedback and numerous suggestions by Segher:

  1. Further improvements to comments and discussion of computational 
complexity.

  2. Changed the name of insn_sequence_no to luid.

  3. Fixed some typos in comments.

  4. Added macro-defined constants to enforce upper bounds on the sizes (and 
number of required iterations) for certain data structures.  The intent is to 
bound compile time for programs that represent large numbers of opportunities 
for D-form replacements.  This optimization pass ignores  parts of a source 
program that exceed these macro-defined size limits.

In a separate mail, I have sent discussion regarding the behavior of preceding 
passes and how this behavior relates to this new pass.

I have built and regression tested this patch on powerpc64le-unknown-linux 
target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2019-10-25  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New function prototype.
(make_pass_insert_dform): Likewise.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
* config.gcc: Add rs6000-p9dform.o object file.

gcc/testsuite/ChangeLog:

2019-10-25  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.
* gcc.target/powerpc/p9-dform-10.c: New test.
* gcc.target/powerpc/p9-dform-11.c: New test.
* gcc.target/powerpc/p9-dform-12.c: New test.
* gcc.target/powerpc/p9-dform-13.c: New test.
* gcc.target/powerpc/p9-dform-14.c: New test.
* gcc.target/powerpc/p9-dform-15.c: New test.
* gcc.target/powerpc/p9-dform-2.c: New test.
* gcc.target/powerpc/p9-dform-3.c: New test.
* gcc.target/powerpc/p9-dform-4.c: New test.
* gcc.target/powerpc/p9-dform-5.c: New test.
* gcc.target/powerpc/p9-dform-6.c: New test.
* gcc.target/powerpc/p9-dform-7.c: New test.
* gcc.target/powerpc/p9-dform-8.c: New test.
* gcc.target/powerpc/p9-dform-9.c: New test.
* gcc.target/powerpc/p9-dform-generic.h: New test.

Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1763 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"