[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #11 from Jeffrey A. Law --- Per c#10.
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 Filip Kastl changed: What|Removed |Added Keywords|needs-bisection | --- Comment #10 from Filip Kastl --- I see that the benchmark's exec time has returned to its original value. If there are no objections, I'll mark this bug as fixed.
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2 CC||law at gcc dot gnu.org
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 --- Comment #9 from Alexander Monakov --- ... as does inserting a nop before the compare ¯\_(ツ)_/¯ --- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300 +++ d.out.ltrans0.ltrans.s 2023-12-01 18:53:04.909438690 +0300 @@ -743,6 +743,7 @@ add_force_to_mom: .p2align 4,,10 .p2align 3 .L58: + nop cmpb$1, -680(%r11,%r12) movapd %xmm5, %xmm7 jne .L54
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 --- Comment #8 from Alexander Monakov --- Thanks, I can reproduce it. It is pretty tricky though. For instance, just swapping the mov and the compare is enough to make it fast: --- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300 +++ d.out.ltrans0.ltrans.fast.s 2023-12-01 18:32:20.318668991 +0300 @@ -743,8 +743,8 @@ add_force_to_mom: .p2align 4,,10 .p2align 3 .L58: - cmpb$1, -680(%r11,%r12) movapd %xmm5, %xmm7 + cmpb$1, -680(%r11,%r12) jne .L54 xorpd %xmm6, %xmm7 .L54:
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 --- Comment #7 from Martin Jambor --- Created attachment 56720 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56720=edit Perf annotate of milc built with r14-4972-g8aa47713701b1f commit r14-4972-g8aa47713701b1f: $ perf stat taskset -c 0 specinvoke Performance counter stats for 'taskset -c 0 specinvoke': 272931.43 msec task-clock:u #1.000 CPUs utilized 0 context-switches:u #0.000 /sec 0 cpu-migrations:u #0.000 /sec 472353 page-faults:u#1.731 K/sec 886165387570 cycles:u #3.247 GHz (83.33%) 31546898034 stalled-cycles-frontend:u#3.56% frontend cycles idle(83.33%) 729878095777 stalled-cycles-backend:u # 82.36% backend cycles idle (83.33%) 1061779557370 instructions:u #1.20 insn per cycle #0.69 stalled cycles per insn (83.33%) 58797121078 branches:u # 215.428 M/sec (83.33%) 6960852 branch-misses:u #0.01% of all branches (83.33%) 272.967381843 seconds time elapsed 268.718335000 seconds user 4.212584000 seconds sys $ perf record taskset -c 0 specinvoke [ perf record: Woken up 167 times to write data ] [ perf record: Captured and wrote 41.549 MB perf.data (1088982 samples) ] $ perf report -n --percent-limit=1 --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1M of event 'cycles:Pu' # Event count (approx.): 883903400858 # # Overhead Samples Command Shared Object Symbol # ... .. .. # 24.34%260907 milc_base.mine- milc_base.mine-lto-gen [.] add_force_to_mom 18.01%198287 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_na 17.45%187529 milc_base.mine- milc_base.mine-lto-gen [.] u_shift_fermion 14.22%155596 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_nn 5.61% 60601 milc_base.mine- milc_base.mine-lto-gen [.] scalar_mult_add_su3_matrix 4.35% 51034 milc_base.mine- milc_base.mine-lto-gen [.] path_product 4.24% 46032 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_an 2.99% 32624 milc_base.mine- milc_base.mine-lto-gen [.] imp_gauge_force.constprop.0 1.50% 16242 milc_base.mine- milc_base.mine-lto-gen [.] compute_gen_staple 1.35% 14580 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_mat_vec_sum_4dir 1.21% 12922 milc_base.mine- milc_base.mine-lto-gen [.] make_anti_hermitian 1.06% 11469 milc_base.mine- milc_base.mine-lto-gen [.] mult_adj_su3_mat_4vec 1.03% 1 milc_base.mine- libc.so.6 [.] __memset_avx2_unaligned_erms $ perf annotate -n --percent-limit=1 > ~/tmp/milc-perf-annotate-8aa47713701 (gzipeped and attached)
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 --- Comment #6 from Martin Jambor --- Created attachment 56719 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56719=edit Perf annotate of milc built with r14-4971-g0beb1611754742 commit r14-4971-g0beb1611754742: $ perf stat taskset -c 0 specinvoke Performance counter stats for 'taskset -c 0 specinvoke': 216908.59 msec task-clock:u #1.000 CPUs utilized 0 context-switches:u #0.000 /sec 0 cpu-migrations:u #0.000 /sec 889694 page-faults:u#4.102 K/sec 697007650237 cycles:u #3.213 GHz (83.33%) 31999772966 stalled-cycles-frontend:u#4.59% frontend cycles idle(83.33%) 540485725923 stalled-cycles-backend:u # 77.54% backend cycles idle (83.33%) 1061256162815 instructions:u #1.52 insn per cycle #0.51 stalled cycles per insn (83.33%) 58760648879 branches:u # 270.901 M/sec (83.34%) 11890202 branch-misses:u #0.02% of all branches (83.33%) 216.935387643 seconds time elapsed 211.436079000 seconds user 5.472459000 seconds sys $ perf record taskset -c 0 specinvoke [ perf record: Woken up 132 times to write data ] [ perf record: Captured and wrote 32.901 MB perf.data (862286 samples) ] $ perf report -n --percent-limit=1 --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 862K of event 'cycles:Pu' # Event count (approx.): 695776598661 # # Overhead Samples Command Shared Object Symbol # ... .. .. # 22.68%197003 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_na 20.99%177912 milc_base.mine- milc_base.mine-lto-gen [.] u_shift_fermion 19.04%163787 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_nn 6.85% 58509 milc_base.mine- milc_base.mine-lto-gen [.] scalar_mult_add_su3_matrix 5.51% 50953 milc_base.mine- milc_base.mine-lto-gen [.] path_product 5.40% 46083 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_an 4.22% 35853 milc_base.mine- milc_base.mine-lto-gen [.] add_force_to_mom 3.77% 32446 milc_base.mine- milc_base.mine-lto-gen [.] imp_gauge_force.constprop.0 1.98% 16848 milc_base.mine- milc_base.mine-lto-gen [.] compute_gen_staple 1.94% 16462 milc_base.mine- milc_base.mine-lto-gen [.] make_anti_hermitian 1.73% 14655 milc_base.mine- milc_base.mine-lto-gen [.] mult_su3_mat_vec_sum_4dir 1.35% 11472 milc_base.mine- milc_base.mine-lto-gen [.] mult_adj_su3_mat_4vec 1.27% 10801 milc_base.mine- libc.so.6 [.] __memset_avx2_unaligned_erms $ perf annotate -n --percent-limit=1 > ~/tmp/milc-perf-annotate-0beb1611754 (gzipeped and attached)
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #5 from Alexander Monakov --- Martin, if you still have the binaries, would you mind sharing perf profiles? You can produce plain-text reports with 'perf report --stdio' and 'perf annotate --stdio'.
[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 Sam James changed: What|Removed |Added Summary|[14 Regression] 30-40% exec |[14 Regression] 30-40% exec |time regression of 433.milc |time regression of 433.milc |on zen2 |on zen2 since ||r14-4972-g8aa47713701b1f --- Comment #4 from Sam James --- I can probably find a znver2 machine for someone to work on if it's needed, but that's obviously not going to be the hardest part here...