On Fri, Dec 30, 2022 at 3:21 AM Mayshao-oc <mayshao...@zhaoxin.com> wrote: > > >Ping. If there are any questions or concerns about the patch, please let me > >know: I'm interested in continuing this cleanup at least for older AMD > >models. > > > Hi Alexander: > According to the speccpu2017 benchmark result, the patch looks good > in lujiazui.
The patch is OK then. Thanks, Uros. > BR > Mayshao > >I noticed I had an extra line in my Changelog: > > > >> (lua_sseicvt_si): Ditto. > > > >It got there accidentally and I will drop it. > > > >Alexander > > > >On Fri, 9 Dec 2022, Alexander Monakov wrote: > > > >> Model the divider in Lujiazui processors as a separate automaton to > >> significantly reduce the overall model size. This should also result > >> in improved accuracy, as pipe 0 should be able to accept new > >> instructions while the divider is occupied. > >> > >> It is unclear why integer divisions are modeled as if pipes 0-3 are > >> all occupied. I've opted to keep a single-cycle reservation of all > >> four pipes together, so GCC should continue trying to pack > >> instructions around a division accordingly. > >> > >> Currently top three symbols in insn-automata.o are: > >> > >> 106102 r lujiazui_core_check > >> 106102 r lujiazui_core_transitions > >> 196123 r lujiazui_core_min_issue_delay > >> > >> This patch shrinks all lujiazui tables to: > >> > >> 3 r lujiazui_decoder_min_issue_delay > >> 20 r lujiazui_decoder_transitions > >> 32 r lujiazui_agu_min_issue_delay > >> 126 r lujiazui_agu_transitions > >> 304 r lujiazui_div_base > >> 352 r lujiazui_div_check > >> 352 r lujiazui_div_transitions > >> 1152 r lujiazui_core_min_issue_delay > >> 1592 r lujiazui_agu_translate > >> 1592 r lujiazui_core_translate > >> 1592 r lujiazui_decoder_translate > >> 1592 r lujiazui_div_translate > >> 3952 r lujiazui_div_min_issue_delay > >> 9216 r lujiazui_core_transitions > >> > >> This continues the work on reducing i386 insn-automata.o size started > >> with similar fixes for division and multiplication instructions in > >> znver.md [1][2]. I plan to submit corresponding fixes for > >> b[td]ver[123].md as well. > >> > >> [1] > >> https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f12 > >> 15f5...@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543 > >> [2] > >> https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonak > >> o...@ispras.ru/ > >> > >> gcc/ChangeLog: > >> > >> PR target/87832 > >> * config/i386/lujiazui.md (lujiazui_div): New automaton. > >> (lua_div): New unit. > >> (lua_idiv_qi): Correct unit in the reservation. > >> (lua_idiv_qi_load): Ditto. > >> (lua_idiv_hi): Ditto. > >> (lua_idiv_hi_load): Ditto. > >> (lua_idiv_si): Ditto. > >> (lua_idiv_si_load): Ditto. > >> (lua_idiv_di): Ditto. > >> (lua_idiv_di_load): Ditto. > >> (lua_fdiv_SF): Ditto. > >> (lua_fdiv_SF_load): Ditto. > >> (lua_fdiv_DF): Ditto. > >> (lua_fdiv_DF_load): Ditto. > >> (lua_fdiv_XF): Ditto. > >> (lua_fdiv_XF_load): Ditto. > >> (lua_ssediv_SF): Ditto. > >> (lua_ssediv_load_SF): Ditto. > >> (lua_ssediv_V4SF): Ditto. > >> (lua_ssediv_load_V4SF): Ditto. > >> (lua_ssediv_V8SF): Ditto. > >> (lua_ssediv_load_V8SF): Ditto. > >> (lua_ssediv_SD): Ditto. > >> (lua_ssediv_load_SD): Ditto. > >> (lua_ssediv_V2DF): Ditto. > >> (lua_ssediv_load_V2DF): Ditto. > >> (lua_ssediv_V4DF): Ditto. > >> (lua_ssediv_load_V4DF): Ditto. > >> (lua_sseicvt_si): Ditto. > >> --- > >> gcc/config/i386/lujiazui.md | 58 > >> +++++++++++++++++++------------------ > >> 1 file changed, 30 insertions(+), 28 deletions(-) > >> > >> diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md > >> index 9046c09f2..58a230c70 100644 > >> --- a/gcc/config/i386/lujiazui.md > >> +++ b/gcc/config/i386/lujiazui.md > >> @@ -19,8 +19,8 @@ > >> > >> ;; Scheduling for ZHAOXIN lujiazui processor. > >> > >> -;; Modeling automatons for decoders, execution pipes and AGU pipes. > >> -(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu") > >> +;; Modeling automatons for decoders, execution pipes, AGU pipes, and > >> divider. > >> +(define_automaton > >> +"lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div") > >> > >> ;; The rules for the decoder are simple: > >> ;; - an instruction with 1 uop can be decoded by any of the three @@ > >> -55,6 +55,8 @@ (define_reservation "lua_decoder01" > >> "lua_decoder0|lua_decoder1") (define_cpu_unit > >> "lua_p0,lua_p1,lua_p2,lua_p3" "lujiazui_core") (define_cpu_unit > >> "lua_p4,lua_p5" "lujiazui_agu") > >> > >> +(define_cpu_unit "lua_div" "lujiazui_div") > >> + > >> (define_reservation "lua_p03" "lua_p0|lua_p3") (define_reservation > >> "lua_p12" "lua_p1|lua_p2") (define_reservation "lua_p1p2" > >> "lua_p1+lua_p2") @@ -229,56 +231,56 @@ (define_insn_reservation > >> "lua_idiv_qi" 21 > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "QI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p0p1p2p3*21") > >> + "lua_decoder0,lua_p0p1p2p3,lua_div*21") > >> > >> (define_insn_reservation "lua_idiv_qi_load" 25 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "QI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*21") > >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*21") > >> > >> (define_insn_reservation "lua_idiv_hi" 22 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "HI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p0p1p2p3*22") > >> + "lua_decoder0,lua_p0p1p2p3,lua_div*22") > >> > >> (define_insn_reservation "lua_idiv_hi_load" 26 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "HI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*22") > >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*22") > >> > >> (define_insn_reservation "lua_idiv_si" 20 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "SI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p0p1p2p3*20") > >> + "lua_decoder0,lua_p0p1p2p3,lua_div*20") > >> > >> (define_insn_reservation "lua_idiv_si_load" 24 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "SI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*20") > >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*20") > >> > >> (define_insn_reservation "lua_idiv_di" 150 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "DI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p0p1p2p3*150") > >> + "lua_decoder0,lua_p0p1p2p3,lua_div*150") > >> > >> (define_insn_reservation "lua_idiv_di_load" 154 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "DI") > >> (eq_attr "type" "idiv")))) > >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*150") > >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*150") > >> > >> ;; x87 floating point operations. > >> > >> @@ -406,42 +408,42 @@ (define_insn_reservation "lua_fdiv_SF" 15 > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "SF") > >> (eq_attr "type" "fdiv,fpspc")))) > >> - "lua_decodern,lua_p0*15") > >> + "lua_decodern,lua_p0,lua_div*15") > >> > >> (define_insn_reservation "lua_fdiv_SF_load" 19 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "SF") > >> (eq_attr "type" "fdiv,fpspc")))) > >> - "lua_decoder01,lua_p45,lua_p0*15") > >> + "lua_decoder01,lua_p45,lua_p0,lua_div*15") > >> > >> (define_insn_reservation "lua_fdiv_DF" 18 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "DF") > >> (eq_attr "type" "fdiv,fpspc")))) > >> - "lua_decodern,lua_p0*18") > >> + "lua_decodern,lua_p0,lua_div*18") > >> > >> (define_insn_reservation "lua_fdiv_DF_load" 22 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "DF") > >> (eq_attr "type" "fdiv,fpspc")))) > >> - "lua_decoder01,lua_p45,lua_p0*18") > >> + "lua_decoder01,lua_p45,lua_p0,lua_div*18") > >> > >> (define_insn_reservation "lua_fdiv_XF" 22 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "XF") > >> (eq_attr "type" "fdiv,fpspc")))) > >> - "lua_decoder0,lua_p0*22") > >> + "lua_decoder0,lua_p0,lua_div*22") > >> > >> (define_insn_reservation "lua_fdiv_XF_load" 26 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "XF") > >> (eq_attr "type" "fdiv,fpspc")))) > >> - "lua_decoder0,lua_p45,lua_p0*22") > >> + "lua_decoder0,lua_p45,lua_p0,lua_div*22") > >> > >> ;; MMX instructions. > >> > >> @@ -593,84 +595,84 @@ (define_insn_reservation "lua_ssediv_SF" 13 > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "SF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decodern,lua_p0*13") > >> + "lua_decodern,lua_p0,lua_div*13") > >> > >> (define_insn_reservation "lua_ssediv_load_SF" 17 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "SF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder01,lua_p45,lua_p0*13") > >> + "lua_decoder01,lua_p45,lua_p0,lua_div*13") > >> > >> (define_insn_reservation "lua_ssediv_V4SF" 23 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "V4SF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decodern,lua_p0*23") > >> + "lua_decodern,lua_p0,lua_div*23") > >> > >> (define_insn_reservation "lua_ssediv_load_V4SF" 27 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "V4SF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder01,lua_p45,lua_p0*23") > >> + "lua_decoder01,lua_p45,lua_p0,lua_div*23") > >> > >> (define_insn_reservation "lua_ssediv_V8SF" 47 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "V8SF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder0,lua_p0*47") > >> + "lua_decoder0,lua_p0,lua_div*47") > >> > >> (define_insn_reservation "lua_ssediv_load_V8SF" 51 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "V8SF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder0,lua_p45,lua_p0*47") > >> + "lua_decoder0,lua_p45,lua_p0,lua_div*47") > >> > >> (define_insn_reservation "lua_ssediv_SD" 17 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "DF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decodern,lua_p0*17") > >> + "lua_decodern,lua_p0,lua_div*17") > >> > >> (define_insn_reservation "lua_ssediv_load_SD" 21 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "DF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder01,lua_p45,lua_p0*17") > >> + "lua_decoder01,lua_p45,lua_p0,lua_div*17") > >> > >> (define_insn_reservation "lua_ssediv_V2DF" 30 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "V2DF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decodern,lua_p0*30") > >> + "lua_decodern,lua_p0,lua_div*30") > >> > >> (define_insn_reservation "lua_ssediv_load_V2DF" 34 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "V2DF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder01,lua_p45,lua_p0*30") > >> + "lua_decoder01,lua_p45,lua_p0,lua_div*30") > >> > >> (define_insn_reservation "lua_ssediv_V4DF" 56 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "none") > >> (and (eq_attr "mode" "V4DF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder0,lua_p0*56") > >> + "lua_decoder0,lua_p0,lua_div*56") > >> > >> (define_insn_reservation "lua_ssediv_load_V4DF" 60 > >> (and (eq_attr "cpu" "lujiazui") > >> (and (eq_attr "memory" "load") > >> (and (eq_attr "mode" "V4DF") > >> (eq_attr "type" "ssediv")))) > >> - "lua_decoder0,lua_p4p5,lua_p0*56") > >> + "lua_decoder0,lua_p4p5,lua_p0,lua_div*56") > >> > >> > >> > > > >