Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-11-01 Thread Alexander Monakov

On Mon, 31 Oct 2022, Jan Hubička wrote:

> Hello,
> thanks for checking the performance.  The patch is OK.

Thanks, pushed the attached patch, and working on a corresponding change for
floating-point divisions.

AlexanderFrom 1962a8b22d3d3fb5b6bb5598295a4571daf8876f Mon Sep 17 00:00:00 2001
From: Alexander Monakov 
Date: Mon, 31 Oct 2022 17:35:57 +0300
Subject: [PATCH] i386: correct integer division modeling in znver.md

In znver.md, division instructions have descriptions like

(define_insn_reservation "znver1_idiv_DI" 41
(and (eq_attr "cpu" "znver1,znver2")
 (and (eq_attr "type" "idiv")
  (and (eq_attr "mode" "DI")
   (eq_attr "memory" "none"
"znver1-double,znver1-ieu2*41")

which says that DImode idiv has latency 41 (which is correct) and that
it occupies 2nd integer execution unit for 41 consecutive cycles, but
that is not correct:

1) the division instruction is partially pipelined, and has throughput
   1/14, not 1/41;

2) for the most part it occupies a separate division unit, not the
   general arithmetic unit.

Evidently, interaction of such 41-cycle paths with the rest of
reservations causes a combinatorial explosion in the automaton.

Fix this by modeling the integer division unit properly, and correcting
reservations to use the measured reciprocal throughput of those
instructions (available from uops.info). A similar correction for
floating-point divisions is left for a followup patch.

Top 5 znver table sizes, before:

68692 r znver1_ieu_check
68692 r znver1_ieu_transitions
99792 r znver1_ieu_min_issue_delay
428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions

After:

1454 r znver1_ieu_translate
1454 r znver1_translate
2304 r znver1_ieu_transitions
428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions

gcc/ChangeLog:

PR target/87832
* config/i386/znver.md (znver1_idiv): New automaton.
(znver1-idiv): New unit.
(znver1_idiv_DI): Correct unit and cycles in the reservation.
(znver1_idiv_SI): Ditto.
(znver1_idiv_HI): Ditto.
(znver1_idiv_QI): Ditto.
(znver1_idiv_mem_DI): Ditto.
(znver1_idiv_mem_SI): Ditto.
(znver1_idiv_mem_HI): Ditto.
(znver1_idiv_mem_QI): Ditto.
(znver3_idiv_DI): Ditto.
(znver3_idiv_SI): Ditto.
(znver3_idiv_HI): Ditto.
(znver3_idiv_QI): Ditto.
(znver3_idiv_mem_DI): Ditto.
(znver3_idiv_mem_SI): Ditto.
(znver3_idiv_mem_HI): Ditto.
(znver3_idiv_mem_QI): Ditto.
---
 gcc/config/i386/znver.md | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md
index 9c25b4e27..4aa098fd8 100644
--- a/gcc/config/i386/znver.md
+++ b/gcc/config/i386/znver.md
@@ -23,8 +23,8 @@ (define_attr "znver1_decode" "direct,vector,double"
 
 ;; AMD znver1, znver2 and znver3 Scheduling
 ;; Modeling automatons for zen decoders, integer execution pipes,
-;; AGU pipes and floating point execution units.
-(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu")
+;; SIMD/FP domain, AGU pipes, and dividers.
+(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu, znver1_idiv")
 
 ;; Decoders unit has 4 decoders and all of them can decode fast path
 ;; and vector type instructions.
@@ -93,6 +93,9 @@ (define_reservation "znver2-fvector" "znver1-fp0+znver1-fp1
  +znver1-fp2+znver1-fp3
  +znver1-agu0+znver1-agu1+znver2-agu2")
 
+;; Dividers
+(define_cpu_unit "znver1-idiv" "znver1_idiv")
+
 ;; Call instruction
 (define_insn_reservation "znver1_call" 1
 (and (eq_attr "cpu" "znver1")
@@ -176,28 +179,28 @@ (define_insn_reservation "znver1_idiv_DI" 41
  (and (eq_attr "type" "idiv")
   (and (eq_attr "mode" "DI")
(eq_attr "memory" "none"
-"znver1-double,znver1-ieu2*41")
+"znver1-double,znver1-idiv*14")
 
 (define_insn_reservation "znver1_idiv_SI" 25
 (and (eq_attr "cpu" "znver1,znver2")
  (and (eq_attr "type" "idiv")
   (and (eq_attr "mode" "SI")
(eq_attr "memory" "none"
-"znver1-double,znver1-ieu2*25")
+"znver1-double,znver1-idiv*14")
 
 (define_insn_reservation "znver1_idiv_HI" 17
 (and (eq_attr "cpu" "znver1,znver2")
  (and (eq_attr "type" "idiv")
   (and (eq_attr "mode" "HI")
(eq_attr "memory" "none"
-

Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-31 Thread Jan Hubička via Gcc-patches
Hello,
thanks for checking the performance.  The patch is OK.
Honza

On Mon, Oct 31, 2022 at 11:39 AM Joshi, Tejas Sanjay <
tejassanjay.jo...@amd.com> wrote:

> [Public]
>
> Hi,
>
> > It is not latency. It is reciprocal throughput. For example, the
> multiplication instruction has
> > latency 3 and reciprocal throughput 1, and the corresponding execution
> unit can accept a new
> > multiplication instruction each cycle. In the .md file we are modeling
> that by saying that
> > multiplication occupies some unit for one cycle (but has latency 3).
>
> We ran spec cpu2017 INT rate with your patch for znver1 and znver3 with O2
> and Ofast. Found no performance differences from the base one.
> The patch looks good.
>
> Thanks and Regards,
> Tejas
>


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-31 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi,

> It is not latency. It is reciprocal throughput. For example, the 
> multiplication instruction has
> latency 3 and reciprocal throughput 1, and the corresponding execution unit 
> can accept a new
> multiplication instruction each cycle. In the .md file we are modeling that 
> by saying that
> multiplication occupies some unit for one cycle (but has latency 3).

We ran spec cpu2017 INT rate with your patch for znver1 and znver3 with O2 and 
Ofast. Found no performance differences from the base one.
The patch looks good.

Thanks and Regards,
Tejas


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-26 Thread Alexander Monakov
On Wed, 26 Oct 2022, Kumar, Venkataramanan wrote:

> > Looking at znver1.md again, I think the problem is caused by incorrect
> > modeling of division instructions: they have descriptions like
> >
> > (define_insn_reservation "znver1_idiv_DI" 41
> > (and (eq_attr "cpu" "znver1,znver2")
> >  (and (eq_attr "type" "idiv")
> >   (and (eq_attr "mode" "DI")
> >(eq_attr "memory" "none"
> > "znver1-double,znver1-ieu2*41")
> >
> > which says that DImode idiv has latency 41 (which is correct) and that it
> > occupies 2nd integer execution unit for 41 consecutive cycles, but that is
> > not correct:
> 
> Yes you are correct. It does not block the 2nd integer execution pipe 
> consecutively for 41 cycles.
> 
> >
> > 1) the division instruction is partially pipelined, and has throughput 1/14
> 
> "Div" unit takes one instruction and in the worst case the latency will be 41 
> cycles in znver1/2.
> But I agree that we can put best case latency of 14 cycles for the scheduler 
> model in znver1/2 .

It is not latency. It is reciprocal throughput. For example, the multiplication
instruction has latency 3 and reciprocal throughput 1, and the corresponding
execution unit can accept a new multiplication instruction each cycle. In the
.md file we are modeling that by saying that multiplication occupies some unit
for one cycle (but has latency 3).

Alexander


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-26 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - General]

Hi Alexander,

Thank you for looking in to this issue.

> -Original Message-
> From: Alexander Monakov 
> Sent: Tuesday, October 25, 2022 12:18 AM
> To: Jan Hubička 
> Cc: Kumar, Venkataramanan ; Jakub
> Jelinek ; Richard Biener
> ; Joshi, Tejas Sanjay
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Mon, 24 Oct 2022, Jan Hubička wrote:
>
> > > By the way, it appears pre-existing znver[123] models are also
> > > causing some kind of combinatorial blow-up, but before znver4 it was
> > > not a blocking issue:
> > >
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgc
> > >
> c.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D87832data=05%7C
> 01%7C
> > >
> Venkataramanan.Kumar%40amd.com%7C5d22bec311ac43b3f56a08dab5f
> 03fc7%7C
> > >
> 3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638022340726474
> 812%7CUnkn
> > >
> own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> 1haW
> > >
> wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=kg2zKCBxDEeYYKijH
> 204QpOC4
> > > 0SJBADOvqlk0LhzJhc%3Dreserved=0
> >
> >
> > It is really easy to make DFA size to grow if there are possibly many
> > instructions in the pipeline (as every possible state of a modelled
> > pipeline needs to be a new state of the automaton). This is
> > essentially depth_of_pipeline * number_of_units with additional states
> > to repesent special instructions and this naturally keeps growing.
> >
> > We could try to break the FP automata into multiple ones, but there
> > are instructions that can go down any pipe which makes this hard or we
> > can try toreduce number of different reservation types (possibly by
> > breaking the automaton to znver1-3 and 4 or so).
> > With znver2 model I experimented with broken up version and common
> one
> > and ended up with smaller binary for combined one.
>
> Looking at znver1.md again, I think the problem is caused by incorrect
> modeling of division instructions: they have descriptions like
>
> (define_insn_reservation "znver1_idiv_DI" 41
> (and (eq_attr "cpu" "znver1,znver2")
>  (and (eq_attr "type" "idiv")
>   (and (eq_attr "mode" "DI")
>(eq_attr "memory" "none"
> "znver1-double,znver1-ieu2*41")
>
> which says that DImode idiv has latency 41 (which is correct) and that it
> occupies 2nd integer execution unit for 41 consecutive cycles, but that is
> not correct:

Yes you are correct. It does not block the 2nd integer execution pipe 
consecutively for 41 cycles.

>
> 1) the division instruction is partially pipelined, and has throughput 1/14

"Div" unit takes one instruction and in the worst case the latency will be 41 
cycles in znver1/2.
But I agree that we can put best case latency of 14 cycles for the scheduler 
model in znver1/2 .

>
> 2) for the most part it occupies a separate division unit, not the general
> arithmetic unit.

Agreed.

>
> (incidentally, I think the blowup is caused by interaction of such super-long
> 41-cycle paths with the rest of reservations)
>
> I think we should fix this by modeling the separate division unit properly,
> and fixing reservations to use the measured reciprocal throughput of those
> instructions (available from uops.info). The following patch does that for
> integer divisions and completely eliminates the integer part of the problem;
> the issue with floating-point divisions remains.
>
> Top 5 znver table sizes, before:
>
> 68692 r znver1_ieu_check
> 68692 r znver1_ieu_transitions
> 99792 r znver1_ieu_min_issue_delay
> 428108 r znver1_fp_min_issue_delay
> 856216 r znver1_fp_transitions
>
> After:
>
> 1454 r znver1_ieu_translate
> 1454 r znver1_translate
> 2304 r znver1_ieu_transitions
> 428108 r znver1_fp_min_issue_delay
> 856216 r znver1_fp_transitions
>
> Will you help getting this reviewed for trunk?
>
>
>
> diff --git a/gcc/config/i386/znver1.md b/gcc/config/i386/znver1.md index
> 9c25b4e27..39b59343d 100644
> --- a/gcc/config/i386/znver1.md
> +++ b/gcc/config/i386/znver1.md
> @@ -24,7 +24,7 @@
>  ;; AMD znver1, znver2 and znver3 Scheduling  ;; Modeling automatons for
> zen decoders, integer execution pip

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-25 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi,

On Mon, Oct 24, 2022 at 4:26 PM Alexander Monakov  
wrote:
> > > This grew insn-automata.cc from 201502 lines to 639968 lines and the
> > > build of the automata (genautomata) to several minutes in my dev tree.
> >
> > Yeah, in my unoptimized non-bootstrapped development tree genautomata
> > now takes over 12 minutes on a fast box, that is simply not acceptable.
> 
> Thank you for notifying us.
> 
> mailto:tejassanjay.jo...@amd.com has posted a patch for review to fix this 
> (as per Honza's comments).
> Ref: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-October%2F604144.html=05%7C01%7CTejasSanjay.Joshi%40amd.com%7C10a544bb98214654ee7808dab5cdafe5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638022192267092598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=3ATGZwSwJZWlJ1EU%2BijPEYTuVFb38gTkAvSWVQNF3AQ%3D=0

> This patch is OK 
We have pushed this patch which reverts the scheduler descriptions for znver4.
Now, on my machine, the build time and insn-automata.cc size is matching with 
previous gcc trunk state.

Thanks and Regards,
Tejas


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-24 Thread Alexander Monakov
On Mon, 24 Oct 2022, Jan Hubička wrote:

> > By the way, it appears pre-existing znver[123] models are also causing
> > some kind
> > of combinatorial blow-up, but before znver4 it was not a blocking issue:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
> 
> 
> It is really easy to make DFA size to grow if there are possibly many
> instructions in the pipeline (as every possible state of a modelled pipeline
> needs to be a new state of the automaton). This is essentially
> depth_of_pipeline * number_of_units with additional states to repesent
> special instructions and this naturally keeps growing.
> 
> We could try to break the FP automata into multiple ones, but there are
> instructions that can go down any pipe which makes this hard
> or we can try toreduce number of different reservation types (possibly by
> breaking the automaton to znver1-3 and 4 or so).
> With znver2 model I experimented with broken up version and common one and
> ended up with smaller binary for combined one.

Looking at znver1.md again, I think the problem is caused by incorrect modeling
of division instructions: they have descriptions like

(define_insn_reservation "znver1_idiv_DI" 41
(and (eq_attr "cpu" "znver1,znver2")
 (and (eq_attr "type" "idiv")
  (and (eq_attr "mode" "DI")
   (eq_attr "memory" "none"
"znver1-double,znver1-ieu2*41")

which says that DImode idiv has latency 41 (which is correct) and that it
occupies 2nd integer execution unit for 41 consecutive cycles, but that is
not correct:

1) the division instruction is partially pipelined, and has throughput 1/14

2) for the most part it occupies a separate division unit, not the general
arithmetic unit.

(incidentally, I think the blowup is caused by interaction of such super-long
41-cycle paths with the rest of reservations)

I think we should fix this by modeling the separate division unit properly, and
fixing reservations to use the measured reciprocal throughput of those
instructions (available from uops.info). The following patch does that for
integer divisions and completely eliminates the integer part of the problem; the
issue with floating-point divisions remains.

Top 5 znver table sizes, before:

68692 r znver1_ieu_check
68692 r znver1_ieu_transitions
99792 r znver1_ieu_min_issue_delay
428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions

After:

1454 r znver1_ieu_translate
1454 r znver1_translate
2304 r znver1_ieu_transitions
428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions

Will you help getting this reviewed for trunk?



diff --git a/gcc/config/i386/znver1.md b/gcc/config/i386/znver1.md
index 9c25b4e27..39b59343d 100644
--- a/gcc/config/i386/znver1.md
+++ b/gcc/config/i386/znver1.md
@@ -24,7 +24,7 @@
 ;; AMD znver1, znver2 and znver3 Scheduling
 ;; Modeling automatons for zen decoders, integer execution pipes,
 ;; AGU pipes and floating point execution units.
-(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu")
+(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu, znver1_idiv")
 
 ;; Decoders unit has 4 decoders and all of them can decode fast path
 ;; and vector type instructions.
@@ -50,6 +50,7 @@
 (define_cpu_unit "znver1-ieu1" "znver1_ieu")
 (define_cpu_unit "znver1-ieu2" "znver1_ieu")
 (define_cpu_unit "znver1-ieu3" "znver1_ieu")
+(define_cpu_unit "znver1-idiv" "znver1_idiv")
 (define_reservation "znver1-ieu" 
"znver1-ieu0|znver1-ieu1|znver1-ieu2|znver1-ieu3")
 
 ;; 2 AGU pipes in znver1 and 3 AGU pipes in znver2 and znver3
@@ -176,28 +177,28 @@
  (and (eq_attr "type" "idiv")
   (and (eq_attr "mode" "DI")
(eq_attr "memory" "none"
-"znver1-double,znver1-ieu2*41")
+"znver1-double,znver1-idiv*14")
 
 (define_insn_reservation "znver1_idiv_SI" 25
 (and (eq_attr "cpu" "znver1,znver2")
  (and (eq_attr "type" "idiv")
   (and (eq_attr "mode" "SI")
(eq_attr "memory" "none"
-"znver1-double,znver1-ieu2*25")
+"znver1-double,znver1-idiv*14")
 
 (define_insn_reservation "znver1_idiv_HI" 17
 (and (eq_attr "cpu" "znver1,znver2")
  (and (eq_attr "type" "idiv")
   (and (eq_attr "mode" "HI")
(eq_attr "memory" "none"
-"znver1-double,znver1-ieu2*17")
+"znver1-double,znver1-idiv*14")
 
 (define_insn_reservation "znver1_idiv_QI" 12
 (and (eq_attr "cpu" "znver1,znver2")
  (and (eq_attr "type" "idiv")
   

Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-24 Thread Jan Hubička via Gcc-patches
On Mon, Oct 24, 2022 at 4:26 PM Alexander Monakov 
wrote:

> > > > This grew insn-automata.cc from 201502 lines to 639968 lines and the
> > > > build of the automata (genautomata) to several minutes in my dev
> tree.
> > >
> > > Yeah, in my unoptimized non-bootstrapped development tree genautomata
> > > now takes over 12 minutes on a fast box, that is simply not acceptable.
> >
> > Thank you for notifying us.
> >
> > tejassanjay.jo...@amd.com has posted a patch for review to fix this (as
> per Honza's comments).
> > Ref: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604144.html


This patch is OK

>
>
> By the way, it appears pre-existing znver[123] models are also causing
> some kind
> of combinatorial blow-up, but before znver4 it was not a blocking issue:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832


It is really easy to make DFA size to grow if there are possibly many
instructions in the pipeline (as every possible state of a modelled pipeline
needs to be a new state of the automaton). This is essentially
depth_of_pipeline * number_of_units with additional states to repesent
special instructions and this naturally keeps growing.

We could try to break the FP automata into multiple ones, but there are
instructions that can go down any pipe which makes this hard
or we can try toreduce number of different reservation types (possibly by
breaking the automaton to znver1-3 and 4 or so).
With znver2 model I experimented with broken up version and common one and
ended up with smaller binary for combined one.

Honza

>
>
> Alexander
>


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-24 Thread Alexander Monakov
> > > This grew insn-automata.cc from 201502 lines to 639968 lines and the
> > > build of the automata (genautomata) to several minutes in my dev tree.
> >
> > Yeah, in my unoptimized non-bootstrapped development tree genautomata
> > now takes over 12 minutes on a fast box, that is simply not acceptable.
> 
> Thank you for notifying us.
> 
> tejassanjay.jo...@amd.com has posted a patch for review to fix this (as per 
> Honza's comments).
> Ref: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604144.html

By the way, it appears pre-existing znver[123] models are also causing some kind
of combinatorial blow-up, but before znver4 it was not a blocking issue:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

Alexander


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-23 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - General]

Hi Richi and Jakub

> -Original Message-
> From: Jakub Jelinek 
> Sent: Saturday, October 22, 2022 10:41 PM
> To: Richard Biener 
> Cc: Kumar, Venkataramanan ; Joshi,
> Tejas Sanjay ; gcc-patches@gcc.gnu.org;
> honza.hubi...@gmail.com
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Fri, Oct 21, 2022 at 01:51:55PM +0200, Richard Biener via Gcc-patches
> wrote:
> > > > > BTW: Perhaps znver1.md is not the right filename anymore, since
> > > > > it hosts
> > > > all four Zen schedulers.
> > > >
> > > > I have renamed the file to znver.md in this revision, PFA.
> > > > Thank you for the review, we will push it for trunk if we don't
> > > > get any further comments.
> > >
> > > I have pushed the patch on behalf of Tejas.
> >
> > This grew insn-automata.cc from 201502 lines to 639968 lines and the
> > build of the automata (genautomata) to several minutes in my dev tree.
>
> Yeah, in my unoptimized non-bootstrapped development tree genautomata
> now takes over 12 minutes on a fast box, that is simply not acceptable.

Thank you for notifying us.

tejassanjay.jo...@amd.com has posted a patch for review to fix this (as per 
Honza's comments).
Ref: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604144.html

Sorry for the inconvenience caused.

Regards,
Venkat.

>
> Jakub



Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-22 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 21, 2022 at 01:51:55PM +0200, Richard Biener via Gcc-patches wrote:
> > > > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> > > all four Zen schedulers.
> > >
> > > I have renamed the file to znver.md in this revision, PFA.
> > > Thank you for the review, we will push it for trunk if we don't get any
> > > further comments.
> >
> > I have pushed the patch on behalf of Tejas.
> 
> This grew insn-automata.cc from 201502 lines to 639968 lines and the build
> of the automata (genautomata) to several minutes in my dev tree.

Yeah, in my unoptimized non-bootstrapped development tree genautomata
now takes over 12 minutes on a fast box, that is simply not acceptable.

Jakub



RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi all,
> Okay, I will prepare another patch which reverts the znver4 instruction 
> reservations and submit it.

PFA the patch which reverts the znver4 instruction reservations. I have also 
made znver4 to use znver3 scheduler for now.
If its good for the trunk, I will submit it.

Thanks and Regards,
Tejas


0001-Remove-znver4-instruction-reservations.patch
Description: 0001-Remove-znver4-instruction-reservations.patch


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Joshi, Tejas Sanjay via Gcc-patches
[AMD Official Use Only - General]

Hi,

> I think it may make sense to make the initial patch without scheduler model 
> update with zen3 scheduling.  I can work on updating the model which needs 
> some benchmarking and setting up > the cost tables first.
> The problem here is that adding extra variants to execution core model likely 
> forces too many states.

Okay, I will prepare another patch which reverts the znver4 instruction 
reservations and submit it.

Thanks and Regards,
Tejas


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Jan Hubicka via Gcc-patches
> On Fri, Oct 21, 2022 at 12:00 PM Kumar, Venkataramanan via Gcc-patches
>  wrote:
> >
> > Hi all,
> >
> > > -Original Message-
> > > From: Joshi, Tejas Sanjay 
> > > Sent: Monday, October 17, 2022 8:09 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kumar, Venkataramanan ;
> > > honza.hubi...@gmail.com; Uros Bizjak 
> > > Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> > > Zen4 CPU
> > >
> > > [Public]
> > >
> > > Hi,
> > >
> > > > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> > > all four Zen schedulers.
> > >
> > > I have renamed the file to znver.md in this revision, PFA.
> > > Thank you for the review, we will push it for trunk if we don't get any
> > > further comments.
> >
> > I have pushed the patch on behalf of Tejas.
> 
> This grew insn-automata.cc from 201502 lines to 639968 lines and the build
> of the automata (genautomata) to several minutes in my dev tree.
> 
> You did something wrong.  Please fix!

I think it may make sense to make the initial patch without scheduler
model update with zen3 scheduling.  I can work on updating the model
which needs some benchmarking and setting up the cost tables first.
The problem here is that adding extra variants to execution core model
likely forces too many states.

In general DFA is not best model for such symmetirc and parallel
execution core (since there are way too many combinations individual
pipes may get).  I was thinking of adding an option to generate
alternative model based on bitmasks, but never got around implementing
that.

So with current infrastructure we always need to simplify a bit. Which
is also not big deal since the scheduling is not well documented
anyway and our model is not precise at all (it misses the on-chip
scheduler).

Honza
> 
> Richard.
> 
> > Regards,
> > Venkat.
> >


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Richard Biener via Gcc-patches
On Fri, Oct 21, 2022 at 12:00 PM Kumar, Venkataramanan via Gcc-patches
 wrote:
>
> Hi all,
>
> > -Original Message-
> > From: Joshi, Tejas Sanjay 
> > Sent: Monday, October 17, 2022 8:09 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kumar, Venkataramanan ;
> > honza.hubi...@gmail.com; Uros Bizjak 
> > Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> > Zen4 CPU
> >
> > [Public]
> >
> > Hi,
> >
> > > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> > all four Zen schedulers.
> >
> > I have renamed the file to znver.md in this revision, PFA.
> > Thank you for the review, we will push it for trunk if we don't get any
> > further comments.
>
> I have pushed the patch on behalf of Tejas.

This grew insn-automata.cc from 201502 lines to 639968 lines and the build
of the automata (genautomata) to several minutes in my dev tree.

You did something wrong.  Please fix!

Richard.

> Regards,
> Venkat.
>


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Kumar, Venkataramanan via Gcc-patches
Hi all, 

> -Original Message-
> From: Joshi, Tejas Sanjay 
> Sent: Monday, October 17, 2022 8:09 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kumar, Venkataramanan ;
> honza.hubi...@gmail.com; Uros Bizjak 
> Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
> 
> [Public]
> 
> Hi,
> 
> > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> all four Zen schedulers.
> 
> I have renamed the file to znver.md in this revision, PFA.
> Thank you for the review, we will push it for trunk if we don't get any
> further comments.

I have pushed the patch on behalf of Tejas. 

Regards,
Venkat.



RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-17 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi,

> BTW: Perhaps znver1.md is not the right filename anymore, since it hosts all 
> four Zen schedulers.

I have renamed the file to znver.md in this revision, PFA.
Thank you for the review, we will push it for trunk if we don't get any further 
comments.

Thanks and Regards,
Tejas


0001-Enable-AMD-znver4-support-and-add-instruction-reserv.patch
Description: 0001-Enable-AMD-znver4-support-and-add-instruction-reserv.patch


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-16 Thread Uros Bizjak via Gcc-patches
On Thu, Oct 13, 2022 at 5:33 PM Joshi, Tejas Sanjay
 wrote:
>
> [Public]
>
> Hi all,
>
> PFA, the patch that enables support for the next generation AMD Zen4 CPU via 
> -march=znver4.
> This is a basic enablement patch and as of now the costings, tunings are kept 
> same as znver3.
>
> Good for trunk?

2022-09-28  Tejas Joshi 

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver4.
* common/config/i386/i386-common.cc (processor_names): Add znver4.
(processor_alias_table): Add znver4 and modularize old znvers.
* common/config/i386/i386-cpuinfo.h (processor_subtypes):
AMDFAM19H_ZNVER4.
* config.gcc (x86_64-*-* |...): Likewise.
* config/i386/driver-i386.cc (host_detect_local_cpu): Let
-march=native recognize znver4 cpus.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add znver4.
* config/i386/i386-options.cc (m_ZNVER4): New definition.
(m_ZNVER): Include m_ZNVER4.
(processor_cost_table): Add znver4.
* config/i386/i386.cc (ix86_reassociation_width): Likewise.
* gcc/config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER4.
(PTA_ZNVER1): New definition.
(PTA_ZNVER2): Likewise.
(PTA_ZNVER3): Likewise.
(PTA_ZNVER4): Likewise.
* config/i386/i386.md (define_attr "cpu"): Add znver4.
* config/i386/x86-tune-costs.h (znver4_cost): New definition.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver4.
(ix86_adjust_cost): Likewise.
* config/i386/znver1.md: Add new reservations for znver4.
* doc/extend.texi: Add details about znver4.
* doc/invoke.texi: Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv29.C: Likewise.

Although I didn't check all the details of the new scheduler model,
the patch LGTM for mainline.

BTW: Perhaps znver1.md is not the right filename anymore, since it
hosts all four Zen schedulers.

Thanks,
Uros.