Re: [PATCH 2/2, AArch64, v2] Pipeline model for APM XGene-1.

2014-11-20 Thread Kyrill Tkachov

Hi Philipp,

I don't mind it being in config/arm if you plan to wire it up later, 
good to know.

Another comment inline

Thanks,
Kyrill

On 19/11/14 21:42, Philipp Tomsich wrote:

Here's an updated patch with Kyrill's and Andrew's comments integrated.

I left the file in the config/arm-directory, as XGene-family is capable of
executing ARMv7 and we will wire this into the 32bit backend in the near
future (moving it now would just cause another move in the near future).

We also moved the 'include' up to where the pipeline models for the
A53/A57/ThunderX are included, as the previous dependency on picking up the
SIMD types from aarch64-simd.md no longer holds true since gcc-4.9.

Cheers,
-Philipp.


---
  gcc/ChangeLog |   6 +
  gcc/config/aarch64/aarch64.md |   3 +-
  gcc/config/arm/xgene1.md  | 520 ++
  3 files changed, 528 insertions(+), 1 deletion(-)
  create mode 100644 gcc/config/arm/xgene1.md

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c9ac0d9..dad2278 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
  2014-11-19  Philipp Tomsich  philipp.toms...@theobroma-systems.com

+   * config/aarch64/aarch64.md: Include xgene1.md.
+   (generic_sched): Set to no for xgene1.
+   * config/arm/xgene1.md: New file.
+
+2014-11-19  Philipp Tomsich  philipp.toms...@theobroma-systems.com
+
 * config/aarch64/aarch64-cores.def (xgene1): Update/add the
 xgene1 (APM XGene-1) core definition.
 * gcc/config/aarch64/aarch64.c: Add cost tables for APM XGene-1
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 597ff8c..1b36384 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -191,7 +191,7 @@

  (define_attr generic_sched yes,no
(const (if_then_else
-  (eq_attr tune cortexa53,cortexa15,thunderx)
+  (eq_attr tune cortexa53,cortexa15,thunderx,xgene1)
(const_string no)
(const_string yes

@@ -199,6 +199,7 @@
  (include ../arm/cortex-a53.md)
  (include ../arm/cortex-a15.md)
  (include thunderx.md)
+(include ../arm/xgene1.md)

  ;; ---
  ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/arm/xgene1.md b/gcc/config/arm/xgene1.md
new file mode 100644
index 000..227f2c7
--- /dev/null
+++ b/gcc/config/arm/xgene1.md
@@ -0,0 +1,520 @@
+;; Machine description for AppliedMicro xgene1 core.
+;; Copyright (C) 2012-2014 Free Software Foundation, Inc.
+;; Contributed by Theobroma Systems Design und Consulting GmbH.
+;;See http://www.theobroma-systems.com for more info.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; http://www.gnu.org/licenses/.
+
+;; Pipeline description for the xgene1 micro-architecture
+
+(define_automaton xgene1)
+
+(define_cpu_unit xgene1_decode_out0 xgene1)
+(define_cpu_unit xgene1_decode_out1 xgene1)
+(define_cpu_unit xgene1_decode_out2 xgene1)
+(define_cpu_unit xgene1_decode_out3 xgene1)
+
+(define_cpu_unit xgene_divide xgene1)
+(define_cpu_unit xgene_fp_divide xgene1)


Why is this xgene_* while the other units xgene1_*?


+
+(define_reservation xgene1_decode1op
+( xgene1_decode_out0 )
+|( xgene1_decode_out1 )
+|( xgene1_decode_out2 )
+|( xgene1_decode_out3 )
+)
+(define_reservation xgene1_decode2op
+( xgene1_decode_out0 + xgene1_decode_out1 )
+|( xgene1_decode_out0 + xgene1_decode_out2 )
+|( xgene1_decode_out0 + xgene1_decode_out3 )
+|( xgene1_decode_out1 + xgene1_decode_out2 )
+|( xgene1_decode_out1 + xgene1_decode_out3 )
+|( xgene1_decode_out2 + xgene1_decode_out3 )
+)
+(define_reservation xgene1_decodeIsolated
+( xgene1_decode_out0 + xgene1_decode_out1 + xgene1_decode_out2 + 
xgene1_decode_out3 )
+)
+
+(define_insn_reservation branch 1
+  (and (eq_attr tune xgene1)
+   (eq_attr type branch))
+  xgene1_decode1op)


insn_reservation names should also have the xgene1_* namespace


+
+(define_insn_reservation nop 1
+  (and (eq_attr tune xgene1)
+   (eq_attr type no_insn))
+  xgene1_decode1op)
+
+(define_insn_reservation call 1
+  (and (eq_attr tune xgene1)
+   (eq_attr type call))
+  xgene1_decode2op)
+
+(define_insn_reservation f_load 10
+  (and (eq_attr tune xgene1)
+   (eq_attr type f_loadd,f_loads))
+  

Re: [PATCH 2/2, AArch64, v2] Pipeline model for APM XGene-1.

2014-11-20 Thread Dr . Philipp Tomsich
Kyrill,

 I don't mind it being in config/arm if you plan to wire it up later, good to 
 know.
 Another comment inline….

I’ll clean up the missing xgene1_ and the mistyped xgene_ prefix and resubmit.

 +(define_insn_reservation div 2
 +  (and (eq_attr tune xgene1)
 +   (eq_attr type sdiv,udiv))
 +  xgene1_decode1op,xgene_divide)
 
 The dangerous part was the reservation duration (the xgene_divide*large 
 number).
 The latency number (2 in this version, 66 in the previous) is not harmful to 
 the automaton size
 and can be as high as needed (if this operation is high latency)

It doesn’t really matter for any workload we’ve encountered, as the hardware is 
better at dealing with ‘div’-latencies than the scheduler (especially, as ‘div’ 
is variable latency and any guess we have will be wrong… we’ll likely add 
scheduling hook function in the future).
The more important thing is to keep the cost of divides high enough in the 
cost-model.

In other words: 66 would be the worst case and will normally not be correct 
anyway. Furthermore, it’s rather unplausible, that we find 264 instructions 
(for this worst-case scenario) to fill the scheduling bubble between the 
div-insn and its result usage.

Best,
Philipp.

Re: [PATCH 2/2, AArch64, v2] Pipeline model for APM XGene-1.

2014-11-20 Thread Kyrill Tkachov

Hi Philipp,

On 20/11/14 10:47, Dr. Philipp Tomsich wrote:

Kyrill,


I don't mind it being in config/arm if you plan to wire it up later, good to 
know.
Another comment inline….

I’ll clean up the missing xgene1_ and the mistyped xgene_ prefix and resubmit.


+(define_insn_reservation div 2
+  (and (eq_attr tune xgene1)
+   (eq_attr type sdiv,udiv))
+  xgene1_decode1op,xgene_divide)

The dangerous part was the reservation duration (the xgene_divide*large 
number).
The latency number (2 in this version, 66 in the previous) is not harmful to 
the automaton size
and can be as high as needed (if this operation is high latency)

It doesn’t really matter for any workload we’ve encountered, as the hardware is 
better at dealing with ‘div’-latencies than the scheduler (especially, as ‘div’ 
is variable latency and any guess we have will be wrong… we’ll likely add 
scheduling hook function in the future).
The more important thing is to keep the cost of divides high enough in the 
cost-model.

In other words: 66 would be the worst case and will normally not be correct 
anyway. Furthermore, it’s rather unplausible, that we find 264 instructions 
(for this worst-case scenario) to fill the scheduling bubble between the 
div-insn and its result usage.


Ok, makes sense. I just thought that 2 is a bit too low but if your 
benchmarking showed it to be reasonable I won't complain ;)


Kyrill



Best,
Philipp.






Re: [PATCH 2/2, AArch64, v2] Pipeline model for APM XGene-1.

2014-11-20 Thread Ramana Radhakrishnan
On Wed, Nov 19, 2014 at 9:42 PM, Philipp Tomsich
philipp.toms...@theobroma-systems.com wrote:
 Here's an updated patch with Kyrill's and Andrew's comments integrated.

 I left the file in the config/arm-directory, as XGene-family is capable of
 executing ARMv7 and we will wire this into the 32bit backend in the near
 future (moving it now would just cause another move in the near future).


Right, if this were making it into the arm backend and if the core
indeed does have AArch32 support, I'd like to see support for the
command line for xgene1 in the AArch32 backend as well for 5.0. Do
have a look in arm-cores.def in gcc/config/arm - there are ways of
using existing tuning options with the command line or putting this as
part of generic. We've been here before and users typically complain
about CPU option X being available in AArch32 state but not in AArch64
state. Since this is a separate tuning option, I'm less worried about
this going in later in stage3 but realistically it would be good to
have the command line options wired up for AArch32 by the end of the
year.

Ramana


 We also moved the 'include' up to where the pipeline models for the
 A53/A57/ThunderX are included, as the previous dependency on picking up the
 SIMD types from aarch64-simd.md no longer holds true since gcc-4.9.

 Cheers,
 -Philipp.


 ---
  gcc/ChangeLog |   6 +
  gcc/config/aarch64/aarch64.md |   3 +-
  gcc/config/arm/xgene1.md  | 520 
 ++
  3 files changed, 528 insertions(+), 1 deletion(-)
  create mode 100644 gcc/config/arm/xgene1.md

 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index c9ac0d9..dad2278 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,5 +1,11 @@
  2014-11-19  Philipp Tomsich  philipp.toms...@theobroma-systems.com

 +   * config/aarch64/aarch64.md: Include xgene1.md.
 +   (generic_sched): Set to no for xgene1.
 +   * config/arm/xgene1.md: New file.
 +
 +2014-11-19  Philipp Tomsich  philipp.toms...@theobroma-systems.com
 +
 * config/aarch64/aarch64-cores.def (xgene1): Update/add the
 xgene1 (APM XGene-1) core definition.
 * gcc/config/aarch64/aarch64.c: Add cost tables for APM XGene-1
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index 597ff8c..1b36384 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -191,7 +191,7 @@

  (define_attr generic_sched yes,no
(const (if_then_else
 -  (eq_attr tune cortexa53,cortexa15,thunderx)
 +  (eq_attr tune cortexa53,cortexa15,thunderx,xgene1)
(const_string no)
(const_string yes

 @@ -199,6 +199,7 @@
  (include ../arm/cortex-a53.md)
  (include ../arm/cortex-a15.md)
  (include thunderx.md)
 +(include ../arm/xgene1.md)

  ;; ---
  ;; Jumps and other miscellaneous insns
 diff --git a/gcc/config/arm/xgene1.md b/gcc/config/arm/xgene1.md
 new file mode 100644
 index 000..227f2c7
 --- /dev/null
 +++ b/gcc/config/arm/xgene1.md
 @@ -0,0 +1,520 @@
 +;; Machine description for AppliedMicro xgene1 core.
 +;; Copyright (C) 2012-2014 Free Software Foundation, Inc.
 +;; Contributed by Theobroma Systems Design und Consulting GmbH.
 +;;See http://www.theobroma-systems.com for more info.
 +;;
 +;; This file is part of GCC.
 +;;
 +;; GCC is free software; you can redistribute it and/or modify it
 +;; under the terms of the GNU General Public License as published by
 +;; the Free Software Foundation; either version 3, or (at your option)
 +;; any later version.
 +;;
 +;; GCC is distributed in the hope that it will be useful, but
 +;; WITHOUT ANY WARRANTY; without even the implied warranty of
 +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +;; General Public License for more details.
 +;;
 +;; You should have received a copy of the GNU General Public License
 +;; along with GCC; see the file COPYING3.  If not see
 +;; http://www.gnu.org/licenses/.
 +
 +;; Pipeline description for the xgene1 micro-architecture
 +
 +(define_automaton xgene1)
 +
 +(define_cpu_unit xgene1_decode_out0 xgene1)
 +(define_cpu_unit xgene1_decode_out1 xgene1)
 +(define_cpu_unit xgene1_decode_out2 xgene1)
 +(define_cpu_unit xgene1_decode_out3 xgene1)
 +
 +(define_cpu_unit xgene_divide xgene1)
 +(define_cpu_unit xgene_fp_divide xgene1)
 +
 +(define_reservation xgene1_decode1op
 +( xgene1_decode_out0 )
 +|( xgene1_decode_out1 )
 +|( xgene1_decode_out2 )
 +|( xgene1_decode_out3 )
 +)
 +(define_reservation xgene1_decode2op
 +( xgene1_decode_out0 + xgene1_decode_out1 )
 +|( xgene1_decode_out0 + xgene1_decode_out2 )
 +|( xgene1_decode_out0 + xgene1_decode_out3 )
 +|( xgene1_decode_out1 + xgene1_decode_out2 )
 +|( xgene1_decode_out1 + xgene1_decode_out3 )
 +|( xgene1_decode_out2 + xgene1_decode_out3 )
 +)
 +(define_reservation