[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #19 from Rama Malladi --- (In reply to Wilco from comment #17) > (In reply to Rama Malladi from comment #16) > > (In reply to Wilco from comment #15) > > > (In reply to Rama Malladi from comment #14) > > > > This fix also improved performance of 538.imagick_r by 15%. Did you > > > > have a > > > > similar observation? Thank you. > > > > > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible > > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the > > > overall > > > FP score? > > > > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are > > the scores I got (relative gains of latest mainline vs. an earlier > > mainline). > > > > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0 > > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c > > Right that's about 3 weeks of changes, I think > 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r. > > > geomean 1.03 > > That's a nice gain in 3 weeks! Hi Wilco, Could you backport the change to active release branches? Thanks.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #18 from Rama Malladi --- (In reply to Wilco from comment #17) > (In reply to Rama Malladi from comment #16) > > (In reply to Wilco from comment #15) > > > (In reply to Rama Malladi from comment #14) > > > > This fix also improved performance of 538.imagick_r by 15%. Did you > > > > have a > > > > similar observation? Thank you. > > > > > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible > > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the > > > overall > > > FP score? > > > > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are > > the scores I got (relative gains of latest mainline vs. an earlier > > mainline). > > > > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0 > > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c > > Right that's about 3 weeks of changes, I think > 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r. > > > geomean 1.03 > > That's a nice gain in 3 weeks! Yes, indeed :-) ... Thank you.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 Wilco changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #17 from Wilco --- (In reply to Rama Malladi from comment #16) > (In reply to Wilco from comment #15) > > (In reply to Rama Malladi from comment #14) > > > This fix also improved performance of 538.imagick_r by 15%. Did you have a > > > similar observation? Thank you. > > > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall > > FP score? > > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are > the scores I got (relative gains of latest mainline vs. an earlier mainline). > > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0 > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c Right that's about 3 weeks of changes, I think 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r. > geomean 1.03 That's a nice gain in 3 weeks!
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #16 from Rama Malladi --- (In reply to Wilco from comment #15) > (In reply to Rama Malladi from comment #14) > > This fix also improved performance of 538.imagick_r by 15%. Did you have a > > similar observation? Thank you. > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall > FP score? I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are the scores I got (relative gains of latest mainline vs. an earlier mainline). Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0 Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c fp 1-copy rate Ratio 503.bwaves_r0.98 507.cactuBSSN_r 1.00 508.namd_r 0.97 510.parest_rNA 511.povray_rNA 519.lbm_r 1.16 521.wrf_r 1.00 526.blender_r 0.99 527.cam4_r NA 538.imagick_r 1.17 544.nab_r 1.01 549.fotonik3d_r NA 554.roms_r 1.00 geomean 1.03
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #15 from Wilco --- (In reply to Rama Malladi from comment #14) > This fix also improved performance of 538.imagick_r by 15%. Did you have a > similar observation? Thank you. No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall FP score?
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #14 from Rama Malladi --- This fix also improved performance of 538.imagick_r by 15%. Did you have a similar observation? Thank you.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #13 from Rama Malladi --- (In reply to CVS Commits from comment #12) > The master branch has been updated by Wilco Dijkstra : > > https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff90032 > > commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff90032 > Author: Wilco Dijkstra > Date: Wed Nov 23 17:27:19 2022 + > > AArch64: Add fma_reassoc_width [PR107413] > > Add a reassocation width for FMA in per-CPU tuning structures. Keep > the existing setting of 1 for cores with 2 FMA pipes (this disables > reassociation), and use 4 for cores with 4 FMA pipes. This improves > SPECFP2017 on Neoverse V1 by ~1.5%. > > gcc/ > PR tree-optimization/107413 > * config/aarch64/aarch64.cc (struct tune_params): Add > fma_reassoc_width to all CPU tuning structures. > (aarch64_reassociation_width): Use fma_reassoc_width. > * config/aarch64/aarch64-protos.h (struct tune_params): Add > fma_reassoc_width. Thank you for this code change/ fix. I will attempt a run with this change.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #12 from CVS Commits --- The master branch has been updated by Wilco Dijkstra : https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff90032 commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff90032 Author: Wilco Dijkstra Date: Wed Nov 23 17:27:19 2022 + AArch64: Add fma_reassoc_width [PR107413] Add a reassocation width for FMA in per-CPU tuning structures. Keep the existing setting of 1 for cores with 2 FMA pipes (this disables reassociation), and use 4 for cores with 4 FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%. gcc/ PR tree-optimization/107413 * config/aarch64/aarch64.cc (struct tune_params): Add fma_reassoc_width to all CPU tuning structures. (aarch64_reassociation_width): Use fma_reassoc_width. * config/aarch64/aarch64-protos.h (struct tune_params): Add fma_reassoc_width.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #11 from Rama Malladi --- (In reply to Wilco from comment #10) > I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll > post a patch that allows per-CPU settings for FMA reassociation, so you'll > get good performance with -mcpu=native. However reassociation really needs > to be taught about the existence of FMAs. Thank you very much Wilco.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 Wilco changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2022-11-04 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |wilco at gcc dot gnu.org --- Comment #10 from Wilco --- (In reply to Rama Malladi from comment #9) > (In reply to Rama Malladi from comment #8) > > (In reply to Wilco from comment #7) > > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the > > > reassociation pass is still splitting FMAs into separate MUL and ADD > > > (which > > > is bad for narrow cores). > > > > Thank you for checking on N1. Did you happen to check on V1 too to reproduce > > the perf results I had? Any other experiments/ tests I can do to help on > > this filing? Thanks again for the debug/ fix. > > I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and > using option 'neoverse-n1' on the Graviton 3 processor (which has support > for SVE). The performance was up by 0.4%, primary contributor being > 519.lbm_r which was up 13%. I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll post a patch that allows per-CPU settings for FMA reassociation, so you'll get good performance with -mcpu=native. However reassociation really needs to be taught about the existence of FMAs.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #9 from Rama Malladi --- (In reply to Rama Malladi from comment #8) > (In reply to Wilco from comment #7) > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the > > reassociation pass is still splitting FMAs into separate MUL and ADD (which > > is bad for narrow cores). > > Thank you for checking on N1. Did you happen to check on V1 too to reproduce > the perf results I had? Any other experiments/ tests I can do to help on > this filing? Thanks again for the debug/ fix. I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and using option 'neoverse-n1' on the Graviton 3 processor (which has support for SVE). The performance was up by 0.4%, primary contributor being 519.lbm_r which was up 13%.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #8 from Rama Malladi --- (In reply to Wilco from comment #7) > The revert results in about 0.5% loss on Neoverse N1, so it looks like the > reassociation pass is still splitting FMAs into separate MUL and ADD (which > is bad for narrow cores). Thank you for checking on N1. Did you happen to check on V1 too to reproduce the perf results I had? Any other experiments/ tests I can do to help on this filing? Thanks again for the debug/ fix.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #7 from Wilco --- (In reply to Rama Malladi from comment #5) > So, looks like we aren't impacted much with this commit revert. > > I haven't yet tried fp_reassoc_width. Will try shortly. The revert results in about 0.5% loss on Neoverse N1, so it looks like the reassociation pass is still splitting FMAs into separate MUL and ADD (which is bad for narrow cores).