Re: [RFA/ARM] Add an integer pipeline description for Cortex-A15

2011-11-30 Thread Matthew Gretton-Dann

On 29/11/11 11:04, Ramana Radhakrishnan wrote:

gcc/ChangeLog:
2011-11-28  Matthew Gretton-Dannmatthew.gretton-d...@arm.com

* config/arm/arm.c (arm_issue_rate): Cortex-A15 can triple
issue.
* config/arm/arm.md (mul64): Add new attribute.
(generic_sched): Cortex-A15 is not scheduled generically
(cortex-a15.md): Include.
* config/arm/cortex-a15.md: New machine description.
* config/arm/t-arm (MD_INCLUDES): Add cortex-a15.md


OK .


This had a conflict with the MD_INCLUDES patch in config/arm/t-arm - so 
the attached is what actually got committed.


Thanks,

Matt
--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM LtdIndex: gcc/config/arm/cortex-a15.md
===
--- gcc/config/arm/cortex-a15.md(revision 0)
+++ gcc/config/arm/cortex-a15.md(revision 0)
@@ -0,0 +1,186 @@
+;; ARM Cortex-A15 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;
+;; Written by Matthew Gretton-Dann matthew.gretton-d...@arm.com
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; http://www.gnu.org/licenses/.
+
+(define_automaton cortex_a15)
+
+;; The Cortex-A15 core is modelled as a triple issue pipeline that has
+;; the following dispatch units.
+;; 1. Two pipelines for simple integer operations: SX1, SX2
+;; 2. Two pipelines for Neon and FP data-processing operations: CX1, CX2
+;; 3. One pipeline for branch operations: BX
+;; 4. One pipeline for integer multiply and divide operations: MX
+;; 5. Two pipelines for load and store operations: LS1, LS2
+;;
+;; We can issue into three pipelines per-cycle.
+;;
+;; We assume that where we have unit pairs xx1 is always filled before xx2.
+
+;; The three issue units
+(define_cpu_unit ca15_i0, ca15_i1, ca15_i2 cortex_a15)
+
+(define_reservation ca15_issue1 (ca15_i0|ca15_i1|ca15_i2))
+(define_reservation ca15_issue2 ((ca15_i0+ca15_i1)|(ca15_i1+ca15_i2)))
+(define_reservation ca15_issue3 (ca15_i0+ca15_i1+ca15_i2))
+(final_presence_set ca15_i1 ca15_i0)
+(final_presence_set ca15_i2 ca15_i1)
+
+;; The main dispatch units
+(define_cpu_unit ca15_sx1, ca15_sx2 cortex_a15)
+(define_cpu_unit ca15_cx1, ca15_cx2 cortex_a15)
+(define_cpu_unit ca15_ls1, ca15_ls2 cortex_a15)
+(define_cpu_unit ca15_bx, ca15_mx cortex_a15)
+
+(define_reservation ca15_ls (ca15_ls1|ca15_ls2))
+
+;; The extended load-store pipeline
+(define_cpu_unit ca15_ldr, ca15_str cortex_a15)
+
+;; The extended ALU pipeline
+(define_cpu_unit ca15_sx1_alu, ca15_sx1_shf, ca15_sx1_sat cortex_a15)
+(define_cpu_unit ca15_sx2_alu, ca15_sx2_shf, ca15_sx2_sat cortex_a15)
+
+;; Simple Execution Unit:
+;;
+;; Simple ALU without shift
+(define_insn_reservation cortex_a15_alu 2
+  (and (eq_attr tune cortexa15)
+   (and (eq_attr type alu)
+(eq_attr neon_type none)))
+  ca15_issue1,(ca15_sx1,ca15_sx1_alu)|(ca15_sx2,ca15_sx2_alu))
+
+;; ALU ops with immediate shift
+(define_insn_reservation cortex_a15_alu_shift 3
+  (and (eq_attr tune cortexa15)
+   (and (eq_attr type alu_shift)
+(eq_attr neon_type none)))
+  ca15_issue1,(ca15_sx1,ca15_sx1+ca15_sx1_shf,ca15_sx1_alu)\
+  |(ca15_sx2,ca15_sx2+ca15_sx2_shf,ca15_sx2_alu))
+
+;; ALU ops with register controlled shift
+(define_insn_reservation cortex_a15_alu_shift_reg 3
+  (and (eq_attr tune cortexa15)
+   (and (eq_attr type alu_shift_reg)
+   (eq_attr neon_type none)))
+  (ca15_issue2,ca15_sx1+ca15_sx2,ca15_sx1_shf,ca15_sx2_alu)\
+   |(ca15_issue1,(ca15_issue1+ca15_sx2,ca15_sx1+ca15_sx2_shf)\
+   |(ca15_issue1+ca15_sx1,ca15_sx1+ca15_sx1_shf),ca15_sx1_alu))
+
+;; Multiply Execution Unit:
+;;
+;; 32-bit multiplies
+(define_insn_reservation cortex_a15_mult32 3
+  (and (eq_attr tune cortexa15)
+   (and (eq_attr type mult)
+   (and (eq_attr neon_type none)
+(eq_attr mul64 no
+  ca15_issue1,ca15_mx)
+
+;; 64-bit multiplies
+(define_insn_reservation cortex_a15_mult64 4
+  (and (eq_attr tune cortexa15)
+   (and (eq_attr type mult)
+   (and (eq_attr neon_type none)
+(eq_attr mul64 yes
+  ca15_issue1,ca15_mx*2)
+
+;; Integer divide
+(define_insn_reservation cortex_a15_udiv 9
+  (and (eq_attr tune cortexa15)
+   (eq_attr insn udiv))
+  ca15_issue1,ca15_mx)
+
+(define_insn_reservation cortex_a15_sdiv 10
+  (and (eq_attr tune cortexa15)
+   

Re: [RFA/ARM] Add an integer pipeline description for Cortex-A15

2011-11-29 Thread Ramana Radhakrishnan
 gcc/ChangeLog:
 2011-11-28  Matthew Gretton-Dannmatthew.gretton-d...@arm.com

        * config/arm/arm.c (arm_issue_rate): Cortex-A15 can triple
        issue.
        * config/arm/arm.md (mul64): Add new attribute.
        (generic_sched): Cortex-A15 is not scheduled generically
        (cortex-a15.md): Include.
        * config/arm/cortex-a15.md: New machine description.
        * config/arm/t-arm (MD_INCLUDES): Add cortex-a15.md

OK .

Ramana


[RFA/ARM] Add an integer pipeline description for Cortex-A15

2011-11-28 Thread Matthew Gretton-Dann

All,

The attached patch adds a integer pipeline description for Cortex-A15.

Although not dependent on my testing has been done on top of Sameera's 
Deshpande's A15 Prologue/Epilogue patches (see: 
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00856.html and following). 
 Testing on some popular embedded benchmarks shows the pipeline 
description gives a 1.8% performance improvement on (geomean) average. 
The testsuite shows no regressions targetting arm-none-eabi and using QEmu.


Can someone please review.

Thanks,

Matt

gcc/ChangeLog:
2011-11-28  Matthew Gretton-Dannmatthew.gretton-d...@arm.com

* config/arm/arm.c (arm_issue_rate): Cortex-A15 can triple
issue.
* config/arm/arm.md (mul64): Add new attribute.
(generic_sched): Cortex-A15 is not scheduled generically
(cortex-a15.md): Include.
* config/arm/cortex-a15.md: New machine description.
* config/arm/t-arm (MD_INCLUDES): Add cortex-a15.md

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e3b0b88..d17f2b5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24123,6 +24123,9 @@ arm_issue_rate (void)
 {
   switch (arm_tune)
 {
+case cortexa15:
+  return 3;
+
 case cortexr4:
 case cortexr4f:
 case cortexr5:
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a78ba88..facbf92 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -355,6 +355,13 @@
 (const_string mult)
 (const_string alu)))
 
+; Is this an (integer side) multiply with a 64-bit result?
+(define_attr mul64 no,yes
+(if_then_else
+  (eq_attr insn 
smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals)
+  (const_string yes)
+  (const_string no)))
+
 ; Load scheduling, set from the arm_ld_sched variable
 ; initialized by arm_option_override()
 (define_attr ldsched no,yes (const (symbol_ref arm_ld_sched)))
@@ -518,7 +525,7 @@
 
 (define_attr generic_sched yes,no
   (const (if_then_else
-  (ior (eq_attr tune 
fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexm4)
+  (ior (eq_attr tune 
fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexa15,cortexm4)
   (eq_attr tune_cortexr4 yes))
   (const_string no)
   (const_string yes
@@ -544,6 +551,7 @@
 (include cortex-a5.md)
 (include cortex-a8.md)
 (include cortex-a9.md)
+(include cortex-a15.md)
 (include cortex-r4.md)
 (include cortex-r4f.md)
 (include cortex-m4.md)
diff --git a/gcc/config/arm/cortex-a15.md b/gcc/config/arm/cortex-a15.md
new file mode 100644
index 000..ccab7cb
--- /dev/null
+++ b/gcc/config/arm/cortex-a15.md
@@ -0,0 +1,186 @@
+;; ARM Cortex-A15 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;
+;; Written by Matthew Gretton-Dann matthew.gretton-d...@arm.com
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; http://www.gnu.org/licenses/.
+
+(define_automaton cortex_a15)
+
+;; The Cortex-A15 core is modelled as a triple issue pipeline that has
+;; the following dispatch units.
+;; 1. Two pipelines for simple integer operations: SX1, SX2
+;; 2. Two pipelines for Neon and FP data-processing operations: CX1, CX2
+;; 3. One pipeline for branch operations: BX
+;; 4. One pipeline for integer multiply and divide operations: MX
+;; 5. Two pipelines for load and store operations: LS1, LS2
+;;
+;; We can issue into three pipelines per-cycle.
+;;
+;; We assume that where we have unit pairs xx1 is always filled before xx2.
+
+;; The three issue units
+(define_cpu_unit ca15_i0, ca15_i1, ca15_i2 cortex_a15)
+
+(define_reservation ca15_issue1 (ca15_i0|ca15_i1|ca15_i2))
+(define_reservation ca15_issue2 ((ca15_i0+ca15_i1)|(ca15_i1+ca15_i2)))
+(define_reservation ca15_issue3 (ca15_i0+ca15_i1+ca15_i2))
+(final_presence_set ca15_i1 ca15_i0)
+(final_presence_set ca15_i2 ca15_i1)
+
+;; The main dispatch units
+(define_cpu_unit ca15_sx1, ca15_sx2 cortex_a15)
+(define_cpu_unit ca15_cx1, ca15_cx2 cortex_a15)
+(define_cpu_unit ca15_ls1, ca15_ls2 cortex_a15)
+(define_cpu_unit ca15_bx, ca15_mx cortex_a15)
+
+(define_reservation ca15_ls (ca15_ls1|ca15_ls2))
+
+;; The extended load-store