Hi, Whilst looking at upstreaming some of CodeSourcery/Mentor's old patches, I came across the following again, originally by Mark Shinwell but last sent upstream by Andrew Stubbs:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html I've attached a new version, re-tested but not re-benchmarked (I'm not set up to do the latter just at the moment). I spent a while being quite unconvinced that this patch was a reasonable solution for the problem (of scheduling transfers between the NEON or VFP and the core unit) -- either on Cortex-A8, which the patch was originally supposed to target, or any other core version. So, this is what I came up with to try to convince myself. Handling of these transfers is unfortunately dealt with rather haphazardly on various cores. I only looked in any detail at Cortex-A8 and Cortex-A9. FAOD, I think we're principally concerned here with instructions like "movdi_vfp", which may be used to transfer either "VFP" or "NEON" register values to and from core registers, but also with movsf and friends for the "runfast"-mode case. * On Cortex-A8, instructions with "f_2_r" (nor "r_2_f") type do not participate in scheduling at all. This was probably the root of the problem the patch was trying to fix originally: this means that there is nothing to stop instructions which use the vfp/neon-to-core result from being placed directly after the transfer. (A scheduler dump shows "nothing" for the unit utilisation for these types of instruction at present.) * On Cortex-A9, the "f_2_r" and "r_2_f" types both use the (core) insn reservation "cortex_a9_fps", which produces its result after 2 cycles. The A9 TRM on infocentre.arm.com says that VFP-to-core transfers take 3 cycles, and NEON-to-core transfers take 11 cycles. * The cortex-r4f.md and cortex-m4-fpu.md scheduler descriptions seem to handle f_2_r and r_2_f sanely. So, there are generally multiple DFA descriptions in play at any given time, relating to different core units with different sets of registers, and transferring values between those cores isn't really being described properly to GCC's scheduler in a consistent way at present. Fixing this "properly", e.g. fully describing the way the NEON unit is decoupled from the core unit on the Cortex-A8, is probably impractical. But, the only cases we're concerned with at present are transfers of values from one unit to another. I think setting both "type" and "neon_type" for transfer instructions is therefore the right thing to do -- although given the above, and other restrictions (i.e. the mutual exclusion between VFP and NEON units on Cortex-A9), it's probably not a complete solution to the problem. To expand: * "type" should be set to f_2_r for MRC-type instructions because of the dependency on an earlier instruction setting the input register. * "neon_type" should be set appropriately for MRC-type instructions because following NEON instructions will have dependencies on the transferred value. * ...and vice-versa for MCR-type instructions. Incidentally other patches, probably merged from our source base, have introduced "neon_type" alongside "type" for a couple of instructions in vfp.md already: namely, *movdi_vfp and *movhf_vfp_neon: whatever we choose to do, we should be consistent. So: OK to apply? Thanks, Julian 2011-02-22 Andrew Stubbs <a...@codesourcery.com> Mark Shinwell <shinw...@codesourcery.com> Julian Brown <jul...@codesourcery.com> gcc/ * config/arm/vfp.md (*arm_movsi_vfp, *thumb2_movsi_vfp) (*movdi_vfp_cortexa8, *movsf_vfp, *thumb2_movsf_vfp) (*movdf_vfp, *thumb2_movdf_vfp, *movsfcc_vfp) (*thumb2_movsfcc_vfp, *movdfcc_vfp, *thumb2_movdfcc_vfp): Add neon_type. * config/arm/arm.md (neon_type): Update comment.
commit 44d6297b8e73eae42b3d6a0da7ca857f549049c1 Author: Julian Brown <jbr...@build6-lucid-cs.sje.mentorg.com> Date: Fri Jul 13 10:32:46 2012 -0700 Set neon_type for vfp-to-core transfers. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 8888382..e9da56d 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -397,8 +397,6 @@ (define_attr "ldsched" "no,yes" (const (symbol_ref "arm_ld_sched"))) ;; Classification of NEON instructions for scheduling purposes. -;; Do not set this attribute and the "type" attribute together in -;; any one instruction pattern. (define_attr "neon_type" "neon_int_1,\ neon_int_2,\ diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 8369949..4a56d57 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -78,6 +78,7 @@ " [(set_attr "predicable" "yes") (set_attr "type" "*,*,*,*,load1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores") + (set_attr "neon_type" "*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*") (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*") (set_attr "pool_range" "*,*,*,*,4096,*,*,*,*,1020,*") (set_attr "neg_pool_range" "*,*,*,*,4084,*,*,*,*,1008,*")] @@ -120,6 +121,7 @@ " [(set_attr "predicable" "yes") (set_attr "type" "*,*,*,*,load1,load1,store1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores") + (set_attr "neon_type" "*,*,*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*") (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*,*,*") (set_attr "pool_range" "*,*,*,*,1020,4096,*,*,*,*,*,1020,*") (set_attr "neg_pool_range" "*,*,*,*, 0, 0,*,*,*,*,*,1008,*")] @@ -212,6 +214,7 @@ } " [(set_attr "type" "*,*,*,*,load2,load2,store2,r_2_f,f_2_r,ffarithd,f_loadd,f_stored") + (set_attr "neon_type" "*,*,*,*,*,*,*,neon_mcr_2_mcrr,neon_mrrc,neon_vmov,*,*") (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8) (eq_attr "alternative" "2") (const_int 12) (eq_attr "alternative" "3") (const_int 16) @@ -370,6 +373,7 @@ [(set_attr "predicable" "yes") (set_attr "type" "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*") + (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*") (set_attr "insn" "*,*,*,*,*,*,*,*,mov") (set_attr "pool_range" "*,*,*,1020,*,4096,*,*,*") (set_attr "neg_pool_range" "*,*,*,1008,*,4080,*,*,*")] @@ -407,6 +411,7 @@ [(set_attr "predicable" "yes") (set_attr "type" "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*") + (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*") (set_attr "insn" "*,*,*,*,*,*,*,*,mov") (set_attr "pool_range" "*,*,*,1020,*,4092,*,*,*") (set_attr "neg_pool_range" "*,*,*,1008,*,0,*,*,*")] @@ -450,6 +455,7 @@ " [(set_attr "type" "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*") + (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*") (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8) (eq_attr "alternative" "7") (if_then_else @@ -493,6 +499,7 @@ " [(set_attr "type" "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*") + (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*") (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8) (eq_attr "alternative" "7") (if_then_else @@ -527,7 +534,8 @@ fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1" [(set_attr "conds" "use") (set_attr "length" "4,4,8,4,4,8,4,4,8") - (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")] + (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r") + (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")] ) (define_insn "*thumb2_movsfcc_vfp" @@ -550,7 +558,8 @@ ite\\t%D3\;fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1" [(set_attr "conds" "use") (set_attr "length" "6,6,10,6,6,10,6,6,10") - (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")] + (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r") + (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")] ) (define_insn "*movdfcc_vfp" @@ -573,7 +582,8 @@ fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1" [(set_attr "conds" "use") (set_attr "length" "4,4,8,4,4,8,4,4,8") - (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")] + (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r") + (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")] ) (define_insn "*thumb2_movdfcc_vfp" @@ -596,7 +606,8 @@ ite\\t%D3\;fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1" [(set_attr "conds" "use") (set_attr "length" "6,6,10,6,6,10,6,6,10") - (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")] + (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r") + (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")] )