Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
Hi Xingxing, Some comments inline on the tuning struct and pipeline description. On 25/02/15 13:32, Xingxing Pan wrote: Hi, This patch merges pipeline description for marvell-whitney to latest code base. Is it OK for trunk? -- Regards, Xingxing whitney_p1_20150225.patch A couple of questions: +const struct tune_params arm_marvell_whitney_tune = +{ + arm_9e_rtx_costs, + cortexa9_extra_costs, + cortex_a9_sched_adjust_cost, Are you sure the cortexa9_extra_costs are best for this? Did you try other cost tables? Do you think it would be good to write your own? The pipeline description you posted seems to be very intricate. If such complexity is needed to get the most out of the core, then perhaps it could benefit from a custom cost table? Also, are you sure cortex_a9_sched_adjust_cost is appropriate? It's a very Cortex-A9 specific hook and we don't use it for any other cores. Did it give meaningful improvements in performance? I don't know much about this core (any public documentation you can point us at, perhaps?) so I'll trust you that these are optimal... + 1,/* Constant limit. */ + 5,/* Max cond insns. */ + ARM_PREFETCH_BENEFICIAL(4,32,32), + false,/* Prefer constant pool. */ + arm_default_branch_cost, + false,/* Prefer LDRD/STRD. */ + {true, true},/* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false,/* Prefer Neon for stringops. */ + 8/* Maximum insns to inline memset. */ +}; + As Maxim said, you need to take into account the two new fields that were recently added, the fusion and prefetcher modelling ones. diff --git a/gcc/config/arm/marvell-whitney.md b/gcc/config/arm/marvell-whitney.md new file mode 100644 index 000..fffbd29 --- /dev/null +++ b/gcc/config/arm/marvell-whitney.md @@ -0,0 +1,678 @@ +;; Marvell ARM Processor Pipeline Description +;; Copyright (C) 2011-2014 Free Software Foundation, Inc. Update the year to 2015 . +;; +;; Contributed by Marvell. + +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. +;; +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; http://www.gnu.org/licenses/. + +(define_attr whitney_type + wTP1,wTP2,wTP3,wTP4,wTP5,wTP6,wTP7,wTP8,wTP9,wTP10,wTP11,wTP12,wTP13,\ + wTP14,wTP15,wTP16,wTP17,wTP18,wTP19,wTP20,wTP21,wTP22,wTP23,wTP24,wTP25,\ + wTP26,wTP27,wTP28,wTP29,wTP30,wTP31,wTP32,wTP33,wTP34,wTP35,wTP36,wTP37,\ + wTP38,wTP39,wTP40,wTP41,wTP42,wTP43,wTP44,wTP45,wTP46,wTP47,wTP48,wTP49,\ + wTP50,wTP51,wTP52,wTP53,wTP54,wTP55,wTP56,wTP57,wTP58,wTP59,\ These types are very hard to remember and understand, and hence maintain. I'm assuming the wTP* notation stands for Whitney TyPe? Look at the other pipeline descriptions in config/arm/ and try to give them more descriptive names, like: whitney_alu_simple, whitney_alu_shift_reg etc. It will make it much easier to understand. I've tried this patch out and added: (automata_option v) (automata_option time) (automata_option stats) (automata_option progress) to the .md file to get some compiler build-time stats on the automata. I see that the whitney_pipe automaton is the largest automaton in terms of NDFA states, and second largest in minimal DFA states: 44170 NDFA states, 188695 NDFA arcs 44170 DFA states, 188695 DFA arcs 7070 minimal DFA states, 66279 minimal DFA arcs 923 all insns 30 insn equivalence classes Could the complexity of it be reduced without sacrificing code quality? In some cores we break up the automaton into two automata, roughly separated into an integer/memory unit and another one for FP/NEON units to reduce the state space. Look at cortex-a15.md and cortex-a15-neon.md or some other description of that kind. + (cond [ + (eq_attr type alu_imm,alu_sreg,alus_imm,alus_sreg,adc_reg,\ + adc_imm,bfm,branch,call,clz,extend,logic_imm,\ + logic_reg,logics_imm,logics_reg,logics_shift_reg,\ + mov_imm,mov_reg,mvn_imm,mvn_reg,shift_imm,\ + shift_reg,rev) + (const_string wTP1) + (eq_attr type
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
On Feb 26, 2015, at 10:36 AM, Xingxing Pan xxing...@marvell.com wrote: On 02/25/2015 09:32 PM, Xingxing Pan wrote: Hi, This patch merges pipeline description for marvell-whitney to latest code base. Is it OK for trunk? Refactor the commit message. ... Add pipeline description for marvell-whitney. 2015-02-26 Xingxing Pan xxing...@marvell.com * config/arm/arm-cores.def: Add new core marvell-whitney. * config/arm/arm-protos.h: (marvell_whitney_vector_mode_qi): Declare. (marvell_whitney_inner_shift): Ditto. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * config/arm/arm.c (arm_marvell_whitney_tune): New structure. (arm_issue_rate): Add marvell_whitney. (marvell_whitney_vector_mode_qi): New function. (marvell_whitney_inner_shift): Ditto. * config/arm/arm.md: Include marvell-whitney.md. (generic_sched): Add marvell_whitney. (generic_vfp): Ditto. * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney. * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md. * config/arm/marvell-whitney.md: New file. * doc/invoke.texi: Document marvell-whitney. ... diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7bf5b4d..e68287f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2000,6 +2000,25 @@ const struct tune_params arm_cortex_a9_tune = ARM_SCHED_AUTOPREF_OFF /* Sched L2 autopref. */ }; +const struct tune_params arm_marvell_whitney_tune = +{ + arm_9e_rtx_costs, + cortexa9_extra_costs, + cortex_a9_sched_adjust_cost, + 1, /* Constant limit. */ + 5, /* Max cond insns. */ + ARM_PREFETCH_BENEFICIAL(4,32,32), + false, /* Prefer constant pool. */ + arm_default_branch_cost, + false, /* Prefer LDRD/STRD. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false, /* Prefer Neon for stringops. */ + 8 /* Maximum insns to inline memset. */ +}; + You need to bootstrap GCC with your patch applied. As it is now, the above tune table is missing at least two entries (one for insn fusing, and one for sched L2 autoprefetcher). It should cause a bootstrap fail. const struct tune_params arm_cortex_a12_tune = { arm_9e_rtx_costs, @@ -11717,6 +11736,51 @@ fa726te_sched_adjust_cost (rtx_insn *insn, rtx link, rtx_insn *dep, int * cost) return true; } +/* Return true if vector element size is byte. */ +bool +marvell_whitney_vector_mode_qi (rtx_insn *insn) +{ + machine_mode mode; + + if (GET_CODE (PATTERN (insn)) == SET) +{ + mode = GET_MODE (SET_DEST (PATTERN (insn))); + if (VECTOR_MODE_P (mode) GET_MODE_INNER (mode) == QImode) + return true; +} + + return false; +} It is preferable to avoid using predicates in DFA definitions. Predicates prevent optimizations of the DFA, which causes them to be bigger and slower. This predicate can be easily replaced by an attribute (set_attr dest_vect_mode mode) on affected instructions (the proper name for the attribute needs some thinking though). That said, I don't a have strong opinion on this case. It may be a worthwhile tradeoff to use the predicate to avoid adding new attribute to a dozen of instructions. If you / other reviewers prefer to keep the predicate -- please add a comment noting that it is used by the DFA. + +/* Return true if a non-shift insn has a shift operand. */ +bool +marvell_whitney_inner_shift (rtx_insn *insn) +{ + rtx pat = PATTERN (insn); + + if (GET_CODE (pat) != SET) +return false; + + /* Is not a shift insn. */ + rtx rvalue = SET_SRC (pat); + RTX_CODE code = GET_CODE (rvalue); + if (code == ASHIFT || code == ASHIFTRT + || code == LSHIFTRT || code == ROTATERT) +return false; + + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, rvalue, ALL) +{ + /* Has shift operation. */ + RTX_CODE code = GET_CODE (*iter); + if (code == ASHIFT || code == ASHIFTRT + || code == LSHIFTRT || code == ROTATERT) + return true; +} + + return false; +} Same comment as above. Otherwise looks good. Thanks! -- Maxim Kuvyrkov www.linaro.org
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
On Thu, Dec 18, 2014 at 10:13 AM, Xingxing Pan xxing...@marvell.com wrote: Hi, This patch contains Marvell Whitney core's pipeline description. Test on arm-linux-gnueabi and no new regression are found. Is it OK for trunk? I haven't had a chance to review this in detail but I think it's too late to add such an invasive patch to the backend given we are in stage4 today. Ramana Regards, Xingxing 2014-12-18 Xingxing Pan xxing...@marvell.com * config/arm/arm-cores.def: Add new core marvell-whitney. * config/arm/arm-protos.h: (marvell_whitney_vector_element_size_is_byte): Declare. (marvell_whitney_non_shift_with_shift_operand): Ditto. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * config/arm/arm.c (arm_marvell_whitney_tune): New structure. (arm_issue_rate): Add marvell_whitney. (marvell_whitney_vector_element_size_is_byte): New function. (marvell_whitney_non_shift_with_shift_operand): Ditto. * config/arm/arm.md: Include marvell-whitney.md. (generic_sched): Add marvell_whitney. (generic_vfp): Ditto. * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney. * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md. * config/arm/marvell-whitney.md: New file. * doc/invoke.texi: Document marvell-whitney. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 423ee9e..b0ffbe1 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney,marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney) /* V7 big.LITTLE implementations */ ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 20cfa9f..e86db1e 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -231,6 +231,9 @@ extern void arm_order_regs_for_local_alloc (void); extern int arm_max_conditional_execute (); +extern bool marvell_whitney_vector_element_size_is_byte (rtx insn); +extern bool marvell_whitney_non_shift_with_shift_operand (rtx insn); + /* Vectorizer cost model implementation. */ struct cpu_vec_costs { const int scalar_stmt_cost; /* Cost of any scalar operation, excluding diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 9b1886e..3371ce3 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -298,6 +298,9 @@ EnumValue Enum(processor_type) String(marvell-pj4) Value(marvell_pj4) EnumValue +Enum(processor_type) String(marvell-whitney) Value(marvell_whitney) + +EnumValue Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index d300c51..c73c33c 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -28,9 +28,10 @@ cortexm1smallmultiply,cortexm0smallmultiply,cortexm0plussmallmultiply, genericv7a,cortexa5,cortexa7, cortexa8,cortexa9,cortexa12, - cortexa15,cortexa17,cortexr4,cortexr4f, - cortexr5,cortexr7,cortexm7, - cortexm4,cortexm3,marvell_pj4, - cortexa15cortexa7,cortexa17cortexa7,cortexa53, - cortexa57,cortexa57cortexa53 + cortexa15,cortexa17,cortexr4, + cortexr4f,cortexr5,cortexr7, + cortexm7,cortexm4,cortexm3, + marvell_pj4,marvell_whitney,cortexa15cortexa7, + cortexa17cortexa7,cortexa53,cortexa57, + cortexa57cortexa53 (const (symbol_ref ((enum attr_tune) arm_tune diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 0ec526b..183da4c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1914,6 +1914,25 @@ const struct tune_params arm_cortex_a9_tune = 8 /* Maximum insns to inline memset. */ }; +const struct tune_params arm_marvell_whitney_tune = +{ + arm_9e_rtx_costs, + cortexa9_extra_costs, + cortex_a9_sched_adjust_cost, + 1, /* Constant limit. */ + 5, /* Max cond insns. */ + ARM_PREFETCH_BENEFICIAL(4,32,32), + false, /* Prefer constant pool. */ + arm_default_branch_cost, + false, /* Prefer
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
On Feb 26, 2015, at 1:12 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: ... I've tried this patch out and added: (automata_option v) (automata_option time) (automata_option stats) (automata_option progress) to the .md file to get some compiler build-time stats on the automata. I see that the whitney_pipe automaton is the largest automaton in terms of NDFA states, and second largest in minimal DFA states: 44170 NDFA states, 188695 NDFA arcs 44170 DFA states, 188695 DFA arcs 7070 minimal DFA states, 66279 minimal DFA arcs 923 all insns 30 insn equivalence classes Could the complexity of it be reduced without sacrificing code quality? In some cores we break up the automaton into two automata, roughly separated into an integer/memory unit and another one for FP/NEON units to reduce the state space. Look at cortex-a15.md and cortex-a15-neon.md or some other description of that kind. I think converting the C predicates to attributes would reduce the number of states since DFA generator will be able to fold them. -- Maxim Kuvyrkov www.linaro.org
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
Hi, This patch merges pipeline description for marvell-whitney to latest code base. Is it OK for trunk? -- Regards, Xingxing commit 83974dde8d9f773df1004aa1d5e3b05d8a33f5e0 Author: Xingxing Pan xxing...@marvell.com Date: Wed Feb 25 10:24:40 2015 +0800 2015-02-25 Xingxing Pan xxing...@marvell.com * config/arm/arm-cores.def: Add new core marvell-whitney. * config/arm/arm-protos.h: (marvell_whitney_vector_mode_qi): Declare. (marvell_whitney_inner_shift): Ditto. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * config/arm/arm.c (arm_marvell_whitney_tune): New structure. (arm_issue_rate): Add marvell_whitney. (marvell_whitney_vector_mode_qi): New function. (marvell_whitney_inner_shift): Ditto. * config/arm/arm.md: Include marvell-whitney.md. (generic_sched): Add marvell_whitney. (generic_vfp): Ditto. * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney. * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md. * config/arm/marvell-whitney.md: New file. * doc/invoke.texi: Document marvell-whitney. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index d7e730d..fc76eb5 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney, marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney) /* V7 big.LITTLE implementations */ ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 307babb..d047dbc 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -231,6 +231,9 @@ extern void arm_order_regs_for_local_alloc (void); extern int arm_max_conditional_execute (); +extern bool marvell_whitney_vector_mode_qi (rtx_insn *insn); +extern bool marvell_whitney_inner_shift (rtx_insn *insn); + /* Vectorizer cost model implementation. */ struct cpu_vec_costs { const int scalar_stmt_cost; /* Cost of any scalar operation, excluding diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 3450e5b..f0f9f3f 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -298,6 +298,9 @@ EnumValue Enum(processor_type) String(marvell-pj4) Value(marvell_pj4) EnumValue +Enum(processor_type) String(marvell-whitney) Value(marvell_whitney) + +EnumValue Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index d459f27..fbfab2e 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -31,7 +31,8 @@ cortexa15,cortexa17,cortexr4, cortexr4f,cortexr5,cortexr7, cortexm7,cortexm4,cortexm3, - marvell_pj4,cortexa15cortexa7,cortexa17cortexa7, - cortexa53,cortexa57,cortexa72, - xgene1,cortexa57cortexa53,cortexa72cortexa53 + marvell_pj4,marvell_whitney,cortexa15cortexa7, + cortexa17cortexa7,cortexa53,cortexa57, + cortexa72,xgene1,cortexa57cortexa53, + cortexa72cortexa53 (const (symbol_ref ((enum attr_tune) arm_tune diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7bf5b4d..e68287f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2000,6 +2000,25 @@ const struct tune_params arm_cortex_a9_tune = ARM_SCHED_AUTOPREF_OFF /* Sched L2 autopref. */ }; +const struct tune_params arm_marvell_whitney_tune = +{ + arm_9e_rtx_costs, + cortexa9_extra_costs, + cortex_a9_sched_adjust_cost, + 1, /* Constant limit. */ + 5, /* Max cond insns. */ + ARM_PREFETCH_BENEFICIAL(4,32,32), + false, /* Prefer constant pool. */ + arm_default_branch_cost, + false, /* Prefer LDRD/STRD. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false, /* Prefer Neon for stringops. */ + 8 /* Maximum insns to inline memset. */ +}; + const struct tune_params arm_cortex_a12_tune = { arm_9e_rtx_costs, @@ -11717,6 +11736,51 @@ fa726te_sched_adjust_cost (rtx_insn *insn, rtx link, rtx_insn *dep, int * cost) return true; } +/* Return true if vector element size is byte. */ +bool +marvell_whitney_vector_mode_qi (rtx_insn *insn) +{ + machine_mode mode; + + if (GET_CODE (PATTERN (insn)) == SET) +{ + mode = GET_MODE (SET_DEST
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
On 02/25/2015 09:32 PM, Xingxing Pan wrote: Hi, This patch merges pipeline description for marvell-whitney to latest code base. Is it OK for trunk? Refactor the commit message. -- Regards, Xingxing Add pipeline description for marvell-whitney. 2015-02-26 Xingxing Pan xxing...@marvell.com * config/arm/arm-cores.def: Add new core marvell-whitney. * config/arm/arm-protos.h: (marvell_whitney_vector_mode_qi): Declare. (marvell_whitney_inner_shift): Ditto. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * config/arm/arm.c (arm_marvell_whitney_tune): New structure. (arm_issue_rate): Add marvell_whitney. (marvell_whitney_vector_mode_qi): New function. (marvell_whitney_inner_shift): Ditto. * config/arm/arm.md: Include marvell-whitney.md. (generic_sched): Add marvell_whitney. (generic_vfp): Ditto. * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney. * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md. * config/arm/marvell-whitney.md: New file. * doc/invoke.texi: Document marvell-whitney. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index d7e730d..fc76eb5 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney, marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney) /* V7 big.LITTLE implementations */ ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 307babb..d047dbc 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -231,6 +231,9 @@ extern void arm_order_regs_for_local_alloc (void); extern int arm_max_conditional_execute (); +extern bool marvell_whitney_vector_mode_qi (rtx_insn *insn); +extern bool marvell_whitney_inner_shift (rtx_insn *insn); + /* Vectorizer cost model implementation. */ struct cpu_vec_costs { const int scalar_stmt_cost; /* Cost of any scalar operation, excluding diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 3450e5b..f0f9f3f 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -298,6 +298,9 @@ EnumValue Enum(processor_type) String(marvell-pj4) Value(marvell_pj4) EnumValue +Enum(processor_type) String(marvell-whitney) Value(marvell_whitney) + +EnumValue Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index d459f27..fbfab2e 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -31,7 +31,8 @@ cortexa15,cortexa17,cortexr4, cortexr4f,cortexr5,cortexr7, cortexm7,cortexm4,cortexm3, - marvell_pj4,cortexa15cortexa7,cortexa17cortexa7, - cortexa53,cortexa57,cortexa72, - xgene1,cortexa57cortexa53,cortexa72cortexa53 + marvell_pj4,marvell_whitney,cortexa15cortexa7, + cortexa17cortexa7,cortexa53,cortexa57, + cortexa72,xgene1,cortexa57cortexa53, + cortexa72cortexa53 (const (symbol_ref ((enum attr_tune) arm_tune diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7bf5b4d..e68287f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2000,6 +2000,25 @@ const struct tune_params arm_cortex_a9_tune = ARM_SCHED_AUTOPREF_OFF /* Sched L2 autopref. */ }; +const struct tune_params arm_marvell_whitney_tune = +{ + arm_9e_rtx_costs, + cortexa9_extra_costs, + cortex_a9_sched_adjust_cost, + 1, /* Constant limit. */ + 5, /* Max cond insns. */ + ARM_PREFETCH_BENEFICIAL(4,32,32), + false, /* Prefer constant pool. */ + arm_default_branch_cost, + false, /* Prefer LDRD/STRD. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false, /* Prefer Neon for stringops. */ + 8 /* Maximum insns to inline memset. */ +}; + const struct tune_params arm_cortex_a12_tune = { arm_9e_rtx_costs, @@ -11717,6 +11736,51 @@ fa726te_sched_adjust_cost (rtx_insn *insn, rtx link, rtx_insn *dep, int * cost) return true; } +/* Return true if vector element size is byte. */ +bool +marvell_whitney_vector_mode_qi (rtx_insn *insn) +{ + machine_mode mode; + + if (GET_CODE (PATTERN (insn)) == SET) +{ + mode = GET_MODE (SET_DEST (PATTERN (insn))); + if (VECTOR_MODE_P (mode) GET_MODE_INNER (mode) == QImode) +
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
On 09/01/2015 19:22, Kyrill Tkachov wrote: Hi Xingxing, On 19/12/14 11:01, Xingxing Pan wrote: +/* Return true if vector element size is byte. */ Minor nit: two spaces after full stop and before */ Same in other places in the patch. +bool +marvell_whitney_vector_element_size_is_byte (rtx insn) +{ + if (GET_CODE (PATTERN (insn)) == SET) +{ + if ((GET_MODE (SET_DEST (PATTERN (insn))) == V8QImode) || + (GET_MODE (SET_DEST (PATTERN (insn))) == V16QImode)) + return true; +} + + return false; +} I see this is called from inside marvell-whitney.md. It seems to me that this function takes RTX insns. Can the type of this be strengthened to rtx_insn * ? Also, this should be refactored and written a bit more generally by checking for VECTOR_MODE_P and then GET_MODE_INNER for QImode, saving you the trouble of enumerating the different vector QI modes. + +/* Return true if INSN has shift operation but is not a shift insn. */ +bool +marvell_whitney_non_shift_with_shift_operand (rtx insn) Similar comment. Can this be strengthened to rtx_insn * ? Thanks, Kyrill +{ + rtx pat = PATTERN (insn); + + if (GET_CODE (pat) != SET) +return false; + + /* Is not a shift insn. */ + rtx rvalue = SET_SRC (pat); + RTX_CODE code = GET_CODE (rvalue); + if (code == ASHIFT || code == ASHIFTRT + || code == LSHIFTRT || code == ROTATERT) +return false; + + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, rvalue, ALL) +{ + /* Has shift operation. */ + RTX_CODE code = GET_CODE (*iter); + if (code == ASHIFT || code == ASHIFTRT + || code == LSHIFTRT || code == ROTATERT) +return true; +} + + return false; +} Hi Kyrill, Thanks for advice. Refactored patch is attached. -- Regards, Xingxing commit 3627056607b1e8604ac8d85ed44fdc7d3209cd3e Author: Xingxing Pan xxing...@marvell.com Date: Thu Dec 18 16:58:05 2014 +0800 2015-01-13 Xingxing Pan xxing...@marvell.com * config/arm/arm-cores.def: Add new core marvell-whitney. * config/arm/arm-protos.h: (marvell_whitney_vector_mode_qi): Declare. (marvell_whitney_inner_shift): Ditto. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * config/arm/arm.c (arm_marvell_whitney_tune): New structure. (arm_issue_rate): Add marvell_whitney. (marvell_whitney_vector_mode_qi): New function. (marvell_whitney_inner_shift): Ditto. * config/arm/arm.md: Include marvell-whitney.md. (generic_sched): Add marvell_whitney. (generic_vfp): Ditto. * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney. * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md. * config/arm/marvell-whitney.md: New file. * doc/invoke.texi: Document marvell-whitney. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 6fa5d99..26eb7ab 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney, marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney) /* V7 big.LITTLE implementations */ ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index fc45348..45001ae 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -231,6 +231,9 @@ extern void arm_order_regs_for_local_alloc (void); extern int arm_max_conditional_execute (); +extern bool marvell_whitney_vector_mode_qi (rtx_insn *insn); +extern bool marvell_whitney_inner_shift (rtx_insn *insn); + /* Vectorizer cost model implementation. */ struct cpu_vec_costs { const int scalar_stmt_cost; /* Cost of any scalar operation, excluding diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index ece9d5e..dc5f364 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -298,6 +298,9 @@ EnumValue Enum(processor_type) String(marvell-pj4) Value(marvell_pj4) EnumValue +Enum(processor_type) String(marvell-whitney) Value(marvell_whitney) + +EnumValue Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index 452820ab..c73c33c 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -31,6 +31,7 @@ cortexa15,cortexa17,cortexr4, cortexr4f,cortexr5,cortexr7, cortexm7,cortexm4,cortexm3, - marvell_pj4,cortexa15cortexa7,cortexa17cortexa7, - cortexa53,cortexa57,cortexa57cortexa53 +
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
Hi Xingxing, On 19/12/14 11:01, Xingxing Pan wrote: +/* Return true if vector element size is byte. */ Minor nit: two spaces after full stop and before */ Same in other places in the patch. +bool +marvell_whitney_vector_element_size_is_byte (rtx insn) +{ + if (GET_CODE (PATTERN (insn)) == SET) +{ + if ((GET_MODE (SET_DEST (PATTERN (insn))) == V8QImode) || + (GET_MODE (SET_DEST (PATTERN (insn))) == V16QImode)) + return true; +} + + return false; +} I see this is called from inside marvell-whitney.md. It seems to me that this function takes RTX insns. Can the type of this be strengthened to rtx_insn * ? Also, this should be refactored and written a bit more generally by checking for VECTOR_MODE_P and then GET_MODE_INNER for QImode, saving you the trouble of enumerating the different vector QI modes. + +/* Return true if INSN has shift operation but is not a shift insn. */ +bool +marvell_whitney_non_shift_with_shift_operand (rtx insn) Similar comment. Can this be strengthened to rtx_insn * ? Thanks, Kyrill +{ + rtx pat = PATTERN (insn); + + if (GET_CODE (pat) != SET) +return false; + + /* Is not a shift insn. */ + rtx rvalue = SET_SRC (pat); + RTX_CODE code = GET_CODE (rvalue); + if (code == ASHIFT || code == ASHIFTRT + || code == LSHIFTRT || code == ROTATERT) +return false; + + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, rvalue, ALL) +{ + /* Has shift operation. */ + RTX_CODE code = GET_CODE (*iter); + if (code == ASHIFT || code == ASHIFTRT + || code == LSHIFTRT || code == ROTATERT) +return true; +} + + return false; +}
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
Hi Xingxing, On 18/12/14 10:13, Xingxing Pan wrote: + +(define_attr tune_marvell_whitney yes,no + (const (if_then_else (eq_attr tune marvell_whitney) + (const_string yes) + (const_string no Why do you need this? The canonical way we do this in the other .md files is check for the tune attribute directly. + +(define_automaton whitney_pipe) + +(define_cpu_unit wCU1 whitney_pipe) +(define_cpu_unit wCU2 whitney_pipe) +(define_cpu_unit wCU3 whitney_pipe) +(define_cpu_unit wCU4 whitney_pipe) +(define_cpu_unit wCU5 whitney_pipe) +(define_cpu_unit wCU6 whitney_pipe) +(define_cpu_unit wCU7 whitney_pipe) +(define_cpu_unit wCU8 whitney_pipe) +(define_cpu_unit wCU9 whitney_pipe) +(define_cpu_unit wCU10 whitney_pipe) +(define_cpu_unit wCU11 whitney_pipe) +(define_cpu_unit wCU12 whitney_pipe) +(define_cpu_unit wCU13 whitney_pipe) + +(define_insn_reservation wIR1 1 + (and (eq_attr tune_marvell_whitney yes) + (eq_attr whitney_type wTP1)) + wCU1|wCU2|(wCU3+wCU4) +) Further to the comment above, this could be written as: +(define_insn_reservation wIR1 1 + (and (eq_attr tune marvell_whitney) + (eq_attr whitney_type wTP1)) + wCU1|wCU2|(wCU3+wCU4) +) Same with the other reservations in this file. Kyrill + +(define_insn_reservation wIR2 2 + (and (eq_attr tune_marvell_whitney yes) + (eq_attr whitney_type wTP2)) + wCU1|wCU2|(wCU3+wCU4) +)
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
Hi Xingxin, It seems that your mail client mangled this patch, at least the following hunk doesn't apply, even when I try to get it from the web archives. Could you please resend it as an attachment perhaps? Thanks, Kyrill On 18/12/14 10:13, Xingxing Pan wrote: diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 423ee9e..b0ffbe1 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney,marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney)
Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney
On 19/12/2014 18:38, Kyrill Tkachov wrote: Hi Xingxin, It seems that your mail client mangled this patch, at least the following hunk doesn't apply, even when I try to get it from the web archives. Could you please resend it as an attachment perhaps? Thanks, Kyrill On 18/12/14 10:13, Xingxing Pan wrote: diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 423ee9e..b0ffbe1 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney,marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney) Hi Kyrill, I've changed the code to use tune attribute directly. The new patch is attached. Thanks, Xingxing commit 56745b611d40b77e1911075159f89959335d0298 Author: Xingxing Pan xxing...@marvell.com Date: Thu Dec 18 16:58:05 2014 +0800 2014-12-18 Xingxing Pan xxing...@marvell.com * config/arm/arm-cores.def: Add new core marvell-whitney. * config/arm/arm-protos.h: (marvell_whitney_vector_element_size_is_byte): Declare. (marvell_whitney_non_shift_with_shift_operand): Ditto. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * config/arm/arm.c (arm_marvell_whitney_tune): New structure. (arm_issue_rate): Add marvell_whitney. (marvell_whitney_vector_element_size_is_byte): New function. (marvell_whitney_non_shift_with_shift_operand): Ditto. * config/arm/arm.md: Include marvell-whitney.md. (generic_sched): Add marvell_whitney. (generic_vfp): Ditto. * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney. * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md. * config/arm/marvell-whitney.md: New file. * doc/invoke.texi: Document marvell-whitney. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 423ee9e..b0ffbe1 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney, marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney) /* V7 big.LITTLE implementations */ ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 20cfa9f..e86db1e 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -231,6 +231,9 @@ extern void arm_order_regs_for_local_alloc (void); extern int arm_max_conditional_execute (); +extern bool marvell_whitney_vector_element_size_is_byte (rtx insn); +extern bool marvell_whitney_non_shift_with_shift_operand (rtx insn); + /* Vectorizer cost model implementation. */ struct cpu_vec_costs { const int scalar_stmt_cost; /* Cost of any scalar operation, excluding diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 9b1886e..3371ce3 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -298,6 +298,9 @@ EnumValue Enum(processor_type) String(marvell-pj4) Value(marvell_pj4) EnumValue +Enum(processor_type) String(marvell-whitney) Value(marvell_whitney) + +EnumValue Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index d300c51..c73c33c 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -28,9 +28,10 @@ cortexm1smallmultiply,cortexm0smallmultiply,cortexm0plussmallmultiply, genericv7a,cortexa5,cortexa7, cortexa8,cortexa9,cortexa12, - cortexa15,cortexa17,cortexr4,cortexr4f, - cortexr5,cortexr7,cortexm7, - cortexm4,cortexm3,marvell_pj4, - cortexa15cortexa7,cortexa17cortexa7,cortexa53, - cortexa57,cortexa57cortexa53 + cortexa15,cortexa17,cortexr4, + cortexr4f,cortexr5,cortexr7, + cortexm7,cortexm4,cortexm3, + marvell_pj4,marvell_whitney,cortexa15cortexa7, + cortexa17cortexa7,cortexa53,cortexa57, + cortexa57cortexa53 (const (symbol_ref ((enum attr_tune) arm_tune diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 0ec526b..183da4c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1914,6 +1914,25 @@ const struct tune_params arm_cortex_a9_tune =