Hi, This patch intends to improve cortex-m4 FPU pipeline description based on below findings:
1) The integer instructions can be pipelined with fused/chained mac instructions. 2) The two-cycle 32-bit floating point load instructions should be put together to save one cycle. The three-cycle 64-bit fp load instructions haven't such feature. 3) The 32-bit floating point store instructions need 1 cycle, not 2 cycles. I use some f32 functions from CMSIS DSPLib to benchmark this patch. All of them show performance improvement i.e. less cycles are needed to perform those functions. Is it OK for trunk? BR, Terry 2013-04-16 Terry Guo <terry....@arm.com> * config/arm/cortex-m4-fpu.md (cortex_m4_v): Delete cpu unit. Replace with ... (cortex_m4_v_a, cortex_m4_v_b): ... new cpu units. (cortex_m4_v, cortex_m4_exa_va, cortex_m4_exb_vb): New reservations. (cortex_m4_fmacs): Use new reservations. (cortex_m4_f_load, cortex_m4_f_store): Likewise.
diff --git a/gcc/config/arm/cortex-m4-fpu.md b/gcc/config/arm/cortex-m4-fpu.md index a1945be..4ce3f10 100644 --- a/gcc/config/arm/cortex-m4-fpu.md +++ b/gcc/config/arm/cortex-m4-fpu.md @@ -18,10 +18,14 @@ ;; along with GCC; see the file COPYING3. If not see ;; <http://www.gnu.org/licenses/>. -;; Use an artifial unit to model FPU. -(define_cpu_unit "cortex_m4_v" "cortex_m4") +;; Use two artificial units to model FPU. +(define_cpu_unit "cortex_m4_v_a" "cortex_m4") +(define_cpu_unit "cortex_m4_v_b" "cortex_m4") +(define_reservation "cortex_m4_v" "cortex_m4_v_a+cortex_m4_v_b") (define_reservation "cortex_m4_ex_v" "cortex_m4_ex+cortex_m4_v") +(define_reservation "cortex_m4_exa_va" "cortex_m4_a+cortex_m4_v_a") +(define_reservation "cortex_m4_exb_vb" "cortex_m4_b+cortex_m4_v_b") ;; Integer instructions following VDIV or VSQRT complete out-of-order. (define_insn_reservation "cortex_m4_fdivs" 15 @@ -44,10 +48,12 @@ (eq_attr "type" "fmuls")) "cortex_m4_ex_v") +;; Integer instructions following multiply-accumulate instructions +;; complete out-of-order. (define_insn_reservation "cortex_m4_fmacs" 4 (and (eq_attr "tune" "cortexm4") (eq_attr "type" "fmacs,ffmas")) - "cortex_m4_ex_v*3") + "cortex_m4_ex_v,cortex_m4_v*2") (define_insn_reservation "cortex_m4_ffariths" 1 (and (eq_attr "tune" "cortexm4") @@ -77,12 +83,12 @@ (define_insn_reservation "cortex_m4_f_load" 2 (and (eq_attr "tune" "cortexm4") (eq_attr "type" "f_loads")) - "cortex_m4_ex_v*2") + "cortex_m4_exa_va,cortex_m4_exb_vb") -(define_insn_reservation "cortex_m4_f_store" 2 +(define_insn_reservation "cortex_m4_f_store" 1 (and (eq_attr "tune" "cortexm4") (eq_attr "type" "f_stores")) - "cortex_m4_ex_v*2") + "cortex_m4_exa_va") (define_insn_reservation "cortex_m4_f_loadd" 3 (and (eq_attr "tune" "cortexm4")