[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2024-03-22 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #11 from Jeffrey A. Law  ---
Per c#10.

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2024-03-21 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

Filip Kastl  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #10 from Filip Kastl  ---
I see that the benchmark's exec time has returned to its original value. If
there are no objections, I'll mark this bug as fixed.

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2024-03-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-12-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

--- Comment #9 from Alexander Monakov  ---
... as does inserting a nop before the compare ¯\_(ツ)_/¯


--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300
+++ d.out.ltrans0.ltrans.s  2023-12-01 18:53:04.909438690 +0300
@@ -743,6 +743,7 @@ add_force_to_mom:
.p2align 4,,10
.p2align 3
 .L58:
+   nop
cmpb$1, -680(%r11,%r12)
movapd  %xmm5, %xmm7
jne .L54

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-12-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

--- Comment #8 from Alexander Monakov  ---
Thanks, I can reproduce it. It is pretty tricky though. For instance, just
swapping the mov and the compare is enough to make it fast:

--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300
+++ d.out.ltrans0.ltrans.fast.s 2023-12-01 18:32:20.318668991 +0300
@@ -743,8 +743,8 @@ add_force_to_mom:
.p2align 4,,10
.p2align 3
 .L58:
-   cmpb$1, -680(%r11,%r12)
movapd  %xmm5, %xmm7
+   cmpb$1, -680(%r11,%r12)
jne .L54
xorpd   %xmm6, %xmm7
 .L54:

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-11-29 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

--- Comment #7 from Martin Jambor  ---
Created attachment 56720
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56720=edit
Perf annotate of milc built with r14-4972-g8aa47713701b1f

commit r14-4972-g8aa47713701b1f:

$ perf stat taskset -c 0 specinvoke

 Performance counter stats for 'taskset -c 0 specinvoke':

 272931.43 msec task-clock:u #1.000 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
472353  page-faults:u#1.731 K/sec   
  886165387570  cycles:u #3.247 GHz
(83.33%)
   31546898034  stalled-cycles-frontend:u#3.56% frontend
cycles idle(83.33%)
  729878095777  stalled-cycles-backend:u #   82.36% backend
cycles idle (83.33%)
 1061779557370  instructions:u   #1.20  insn per
cycle
  #0.69  stalled cycles per
insn (83.33%)
   58797121078  branches:u   #  215.428 M/sec  
(83.33%)
   6960852  branch-misses:u  #0.01% of all
branches (83.33%)

 272.967381843 seconds time elapsed

 268.718335000 seconds user
   4.212584000 seconds sys

$ perf record taskset -c 0 specinvoke
[ perf record: Woken up 167 times to write data ]
[ perf record: Captured and wrote 41.549 MB perf.data (1088982 samples) ]

$ perf report -n --percent-limit=1 --stdio
# To display the perf.data header info, please use --header/--header-only
options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of event 'cycles:Pu'
# Event count (approx.): 883903400858
#
# Overhead   Samples  Command  Shared Object   Symbol   
#     ...  .. 
..
#
24.34%260907  milc_base.mine-  milc_base.mine-lto-gen  [.]
add_force_to_mom
18.01%198287  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_na
17.45%187529  milc_base.mine-  milc_base.mine-lto-gen  [.]
u_shift_fermion
14.22%155596  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_nn
 5.61% 60601  milc_base.mine-  milc_base.mine-lto-gen  [.]
scalar_mult_add_su3_matrix
 4.35% 51034  milc_base.mine-  milc_base.mine-lto-gen  [.]
path_product
 4.24% 46032  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_an
 2.99% 32624  milc_base.mine-  milc_base.mine-lto-gen  [.]
imp_gauge_force.constprop.0
 1.50% 16242  milc_base.mine-  milc_base.mine-lto-gen  [.]
compute_gen_staple
 1.35% 14580  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_mat_vec_sum_4dir
 1.21% 12922  milc_base.mine-  milc_base.mine-lto-gen  [.]
make_anti_hermitian
 1.06% 11469  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_adj_su3_mat_4vec
 1.03% 1  milc_base.mine-  libc.so.6   [.]
__memset_avx2_unaligned_erms


$ perf annotate -n --percent-limit=1 > ~/tmp/milc-perf-annotate-8aa47713701 
(gzipeped and attached)

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-11-29 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

--- Comment #6 from Martin Jambor  ---
Created attachment 56719
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56719=edit
Perf annotate of milc built with r14-4971-g0beb1611754742

commit r14-4971-g0beb1611754742:

$ perf stat taskset -c 0 specinvoke

 Performance counter stats for 'taskset -c 0 specinvoke':

 216908.59 msec task-clock:u #1.000 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
889694  page-faults:u#4.102 K/sec   
  697007650237  cycles:u #3.213 GHz
(83.33%)
   31999772966  stalled-cycles-frontend:u#4.59% frontend
cycles idle(83.33%)
  540485725923  stalled-cycles-backend:u #   77.54% backend
cycles idle (83.33%)
 1061256162815  instructions:u   #1.52  insn per
cycle
  #0.51  stalled cycles per
insn (83.33%)
   58760648879  branches:u   #  270.901 M/sec  
(83.34%)
  11890202  branch-misses:u  #0.02% of all
branches (83.33%)

 216.935387643 seconds time elapsed

 211.436079000 seconds user
   5.472459000 seconds sys

$ perf record taskset -c 0 specinvoke
[ perf record: Woken up 132 times to write data ]
[ perf record: Captured and wrote 32.901 MB perf.data (862286 samples) ]


$ perf report -n --percent-limit=1 --stdio
# To display the perf.data header info, please use --header/--header-only
options.
#
#
# Total Lost Samples: 0
#
# Samples: 862K of event 'cycles:Pu'
# Event count (approx.): 695776598661
#
# Overhead   Samples  Command  Shared Object   Symbol   
#     ...  .. 
..
#
22.68%197003  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_na
20.99%177912  milc_base.mine-  milc_base.mine-lto-gen  [.]
u_shift_fermion
19.04%163787  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_nn
 6.85% 58509  milc_base.mine-  milc_base.mine-lto-gen  [.]
scalar_mult_add_su3_matrix
 5.51% 50953  milc_base.mine-  milc_base.mine-lto-gen  [.]
path_product
 5.40% 46083  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_an
 4.22% 35853  milc_base.mine-  milc_base.mine-lto-gen  [.]
add_force_to_mom
 3.77% 32446  milc_base.mine-  milc_base.mine-lto-gen  [.]
imp_gauge_force.constprop.0
 1.98% 16848  milc_base.mine-  milc_base.mine-lto-gen  [.]
compute_gen_staple
 1.94% 16462  milc_base.mine-  milc_base.mine-lto-gen  [.]
make_anti_hermitian
 1.73% 14655  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_su3_mat_vec_sum_4dir
 1.35% 11472  milc_base.mine-  milc_base.mine-lto-gen  [.]
mult_adj_su3_mat_4vec
 1.27% 10801  milc_base.mine-  libc.so.6   [.]
__memset_avx2_unaligned_erms


$ perf annotate -n --percent-limit=1 > ~/tmp/milc-perf-annotate-0beb1611754 
(gzipeped and attached)

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-11-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov  ---
Martin, if you still have the binaries, would you mind sharing perf profiles?
You can produce plain-text reports with 'perf report --stdio' and 'perf
annotate --stdio'.

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-11-24 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697

Sam James  changed:

   What|Removed |Added

Summary|[14 Regression] 30-40% exec |[14 Regression] 30-40% exec
   |time regression of 433.milc |time regression of 433.milc
   |on zen2 |on zen2 since
   ||r14-4972-g8aa47713701b1f

--- Comment #4 from Sam James  ---
I can probably find a znver2 machine for someone to work on if it's needed, but
that's obviously not going to be the hardest part here...