Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
On 10/11/21 13:05, Maxim Kuvyrkov wrote: On 8 Oct 2021, at 13:22, Martin Jambor wrote: Hi, On Fri, Oct 01 2021, Gerald Pfeifer wrote: On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote: Configurations that track master branches have 3-day intervals. Configurations that track release branches — 6 days. If a regression is detected it is narrowed down to component first — binutils, gcc or glibc — and then the commit range of the component is bisected down to a specific commit. All. Done. Automatically. I will make a presentation on this CI at the next GNU Tools Cauldron. Yes, please! :-) On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote: It’s our next big improvement — to provide a dashboard with current performance numbers and historical stats. Awesome. And then we can even link from gcc.gnu.org. You all are aware of the openSUSE LNT periodic SPEC benchmarker, right? Martin may explain better how to move around it, but the two most interesting result pages are: - https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report and - https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch Hi Martin, The novel part of TCWG CI is that it bisects “regressions” down to a single commit, thus pin-pointing the interesting commit, and can send out notifications to patch authors. Hello Maxim. We do generate a fair number of benchmarking data for AArch64 and AArch32, and I want to have them plotted somewhere. I have started to put together an LNT instance to do that, but after a couple of days I couldn't figure out the setup. Could you share the configuration of your LNT instance? Or, perhaps, make it open to the community so that others can upload the results? Sure, I would be more than happy sharing our LNT configuration. Note we don't use the vanilla version, because it does not support git revisions (so that we use $timeshamp.$hash), and modified LNT GUI can interpret that. As Martin mentioned, the useful page latest_runs_report is upstreamed by me: https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report and these pages: https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/options https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/tuning Do rely on special naming scheme of Machines, e.g.: benzen.spec2006.gcc-10.Ofast_generic and a custom modification of LNT generates the pages. I can share it with you as well. @Maxim: Please write me a private email and I can share all the details you need. About the public LNT instance, we are likely not willing to share it right now. Cheers, Martin Thanks, -- Maxim Kuvyrkov https://www.linaro.org ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
> On 8 Oct 2021, at 13:22, Martin Jambor wrote: > > Hi, > > On Fri, Oct 01 2021, Gerald Pfeifer wrote: >> On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote: >>> Configurations that track master branches have 3-day intervals. >>> Configurations that track release branches — 6 days. If a regression is >>> detected it is narrowed down to component first — binutils, gcc or glibc >>> — and then the commit range of the component is bisected down to a >>> specific commit. All. Done. Automatically. >>> >>> I will make a presentation on this CI at the next GNU Tools Cauldron. >> >> Yes, please! :-) >> >> On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote: >>> It’s our next big improvement — to provide a dashboard with current >>> performance numbers and historical stats. >> >> Awesome. And then we can even link from gcc.gnu.org. >> > > You all are aware of the openSUSE LNT periodic SPEC benchmarker, right? > Martin may explain better how to move around it, but the two most > interesting result pages are: > > - https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report and > - https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch > Hi Martin, The novel part of TCWG CI is that it bisects “regressions” down to a single commit, thus pin-pointing the interesting commit, and can send out notifications to patch authors. We do generate a fair number of benchmarking data for AArch64 and AArch32, and I want to have them plotted somewhere. I have started to put together an LNT instance to do that, but after a couple of days I couldn't figure out the setup. Could you share the configuration of your LNT instance? Or, perhaps, make it open to the community so that others can upload the results? Thanks, -- Maxim Kuvyrkov https://www.linaro.org ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
Hi, On Fri, Oct 01 2021, Gerald Pfeifer wrote: > On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote: >> Configurations that track master branches have 3-day intervals. >> Configurations that track release branches — 6 days. If a regression is >> detected it is narrowed down to component first — binutils, gcc or glibc >> — and then the commit range of the component is bisected down to a >> specific commit. All. Done. Automatically. >> >> I will make a presentation on this CI at the next GNU Tools Cauldron. > > Yes, please! :-) > > On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote: >> It’s our next big improvement — to provide a dashboard with current >> performance numbers and historical stats. > > Awesome. And then we can even link from gcc.gnu.org. > You all are aware of the openSUSE LNT periodic SPEC benchmarker, right? Martin may explain better how to move around it, but the two most interesting result pages are: - https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report and - https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch Martin ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
On 9/27/2021 7:52 AM, Aldy Hernandez wrote: [CCing Jeff and list for broader audience] On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote: Hi Aldy, Your patch seems to slow down 471.omnetpp by 8% at -O3. Could you please take a look if this is something that could be easily fixed? First of all, thanks for chasing this down. It's incredibly useful to have these types of bug reports. Jeff and I have been discussing the repercussions of adjusting the loop crossing restrictions in the various threaders. He's seen some regressions in embedded targets when disallowing certain corner cases of loop crossing threads causes all sorts of grief. Out of curiosity, does the attached (untested) patch fix the regression? And just a note, that patch doesn't seem to fix the regressions on visium or rl78. I haven't checked any of the other regressing targets yet. jeff ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote: > Configurations that track master branches have 3-day intervals. > Configurations that track release branches — 6 days. If a regression is > detected it is narrowed down to component first — binutils, gcc or glibc > — and then the commit range of the component is bisected down to a > specific commit. All. Done. Automatically. > > I will make a presentation on this CI at the next GNU Tools Cauldron. Yes, please! :-) On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote: > It’s our next big improvement — to provide a dashboard with current > performance numbers and historical stats. Awesome. And then we can even link from gcc.gnu.org. Gerald ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
> On 29 Sep 2021, at 21:21, Andrew MacLeod wrote: > > On 9/29/21 7:59 AM, Maxim Kuvyrkov wrote: >> >>> Does it run like once a day/some-time-period, and if you note a >>> regression, narrow it down? >> Configurations that track master branches have 3-day intervals. >> Configurations that track release branches — 6 days. If a regression is >> detected it is narrowed down to component first — binutils, gcc or glibc — >> and then the commit range of the component is bisected down to a specific >> commit. All. Done. Automatically. >> >> I will make a presentation on this CI at the next GNU Tools Cauldron. >> >>> Regardless, I think it could be very useful to be able to see the results >>> of anything you do run at whatever frequency it happens. >> Thanks! >> >> -- > > One more follow on question.. is this information/summary of the results > every 3rd day interval of master published anywhere? ie, to a web page or > posted somewhere?that seems like it could useful, especially with a +/- > differential from the previous run (which you obviously calculate to > determine if there is a regression). It’s our next big improvement — to provide a dashboard with current performance numbers and historical stats. Performance summary information is publicly available as artifacts in jenkins jobs (e.g., [1]), but one needs to know exactly where to look. We plan to implement the dashboard before the end of the year. We also have raw perf.data files and benchmark executables stashed for detailed inspection. I /think/, we can publish these for SPEC CPU2xxx benchmarks — they are all based on open-source software. For other benchmarks (EEMBC, CoreMark Pro) we can’t publish much beyond time/size metrics. [1] https://ci.linaro.org/view/tcwg_bmk_ci_gnu/job/tcwg_bmk_ci_gnu-build-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/237/artifact/artifacts/11-check_regression/results.csv/*view*/ Regards, -- Maxim Kuvyrkov https://www.linaro.org ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
On 9/27/21 11:39 AM, Maxim Kuvyrkov via Gcc wrote: On 27 Sep 2021, at 16:52, Aldy Hernandez wrote: [CCing Jeff and list for broader audience] On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote: Hi Aldy, Your patch seems to slow down 471.omnetpp by 8% at -O3. Could you please take a look if this is something that could be easily fixed? First of all, thanks for chasing this down. It's incredibly useful to have these types of bug reports. Thanks, Aldy, this is music to my ears :-). We have built this automated benchmarking CI that bisects code-speed and code-size regressions down to a single commit. It is still work-in-progress, and I’m forwarding these reports to patch authors, whose patches caused regressions. If GCC community finds these useful, we can also setup posting to one of GCC’s mailing lists. I second that this sort of thing is incredibly useful. I don't suppose its easy to do the reverse?... let patch authors know when they've caused a significant improvement? :-) That would be much less common I suspect, so perhaps not worth it :-) Its certainly very useful when we are making a wholesale change to a pass which we think is beneficial, but aren't sure. And a followup question... Sometimes we have no good way of determining the widespread run-time effects of a change. You seem to be running SPEC/other things continuously then? Does it run like once a day/some-time-period, and if you note a regression, narrow it down? Regardless, I think it could be very useful to be able to see the results of anything you do run at whatever frequency it happens. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
> On 27 Sep 2021, at 16:52, Aldy Hernandez wrote: > > [CCing Jeff and list for broader audience] > > On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote: >> Hi Aldy, >> Your patch seems to slow down 471.omnetpp by 8% at -O3. Could you please >> take a look if this is something that could be easily fixed? > > First of all, thanks for chasing this down. It's incredibly useful to have > these types of bug reports. Thanks, Aldy, this is music to my ears :-). We have built this automated benchmarking CI that bisects code-speed and code-size regressions down to a single commit. It is still work-in-progress, and I’m forwarding these reports to patch authors, whose patches caused regressions. If GCC community finds these useful, we can also setup posting to one of GCC’s mailing lists. > > Jeff and I have been discussing the repercussions of adjusting the loop > crossing restrictions in the various threaders. He's seen some regressions > in embedded targets when disallowing certain corner cases of loop crossing > threads causes all sorts of grief. > > Out of curiosity, does the attached (untested) patch fix the regression? I’ll test the patch and will follow up. Regards, -- Maxim Kuvyrkov https://www.linaro.org > > Aldy > >> Regards, >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >>> On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote: >>> >>> After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 >>> Author: Aldy Hernandez >>> >>>Avoid invalid loop transformations in jump threading registry. >>> >>> the following benchmarks slowed down by more than 2%: >>> - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples >>> >>> Below reproducer instructions can be used to re-build both "first_bad" and >>> "last_good" cross-toolchains used in this bisection. Naturally, the >>> scripts will fail when triggerring benchmarking jobs if you don't have >>> access to Linaro TCWG CI. >>> >>> For your convenience, we have uploaded tarballs with pre-processed source >>> and assembly files at: >>> - First_bad save-temps: >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/ >>> - Last_good save-temps: >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/ >>> - Baseline save-temps: >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/ >>> >>> Configuration: >>> - Benchmark: SPEC CPU2006 >>> - Toolchain: GCC + Glibc + GNU Linker >>> - Version: all components were built from their tip of trunk >>> - Target: arm-linux-gnueabihf >>> - Compiler flags: -O3 -marm >>> - Hardware: NVidia TK1 4x Cortex-A15 >>> >>> This benchmarking CI is work-in-progress, and we welcome feedback and >>> suggestions at linaro-toolchain@lists.linaro.org . In our improvement >>> plans is to add support for SPEC CPU2017 benchmarks and provide "perf >>> report/annotate" data behind these reports. >>> >>> THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, >>> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. >>> >>> This commit has regressed these CI configurations: >>> - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 >>> >>> First_bad build: >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/ >>> Last_good build: >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/ >>> Baseline build: >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/ >>> Even more details: >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/ >>> >>> Reproduce builds: >>> >>> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 >>> cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 >>> >>> # Fetch scripts >>> git clone https://git.linaro.org/toolchain/jenkins-scripts >>> >>> # Fetch manifests and test.sh script >>> mkdir -p artifacts/manifests >>> curl -o artifacts/manifests/build-baseline.sh >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh >>> --fail >>> curl -o artifacts/manifests/build-parameters.sh >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh >>> --fail >>> curl -o artifacts/test.sh >>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh >>> --fail >>> chmod +x
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
> On 27 Sep 2021, at 19:02, Andrew MacLeod wrote: > > On 9/27/21 11:39 AM, Maxim Kuvyrkov via Gcc wrote: >>> On 27 Sep 2021, at 16:52, Aldy Hernandez wrote: >>> >>> [CCing Jeff and list for broader audience] >>> >>> On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote: Hi Aldy, Your patch seems to slow down 471.omnetpp by 8% at -O3. Could you please take a look if this is something that could be easily fixed? >>> First of all, thanks for chasing this down. It's incredibly useful to have >>> these types of bug reports. >> Thanks, Aldy, this is music to my ears :-). >> >> We have built this automated benchmarking CI that bisects code-speed and >> code-size regressions down to a single commit. It is still >> work-in-progress, and I’m forwarding these reports to patch authors, whose >> patches caused regressions. If GCC community finds these useful, we can >> also setup posting to one of GCC’s mailing lists. > > I second that this sort of thing is incredibly useful. I don't suppose its > easy to do the reverse?... let patch authors know when they've caused a > significant improvement? :-) That would be much less common I suspect, so > perhaps not worth it :-) We do this occasionally, when identifying a regression in a patch revert commit :-). Seriously, though, it’s an easy enough code-change to the metric, but we are maxing out our benchmarking capacity with current configuration matrix. > > Its certainly very useful when we are making a wholesale change to a pass > which we think is beneficial, but aren't sure. > > And a followup question... Sometimes we have no good way of determining the > widespread run-time effects of a change. You seem to be running SPEC/other > things continuously then? We continuously run SPEC CPU2006 on {arm,aarch64}-{-Os/-O2/-O3}-{no LTO/LTO} matrix for GNU and LLVM toolchains. In the GNU toolchain we track master branches and latest-release branches of Binutils, GCC and Glibc — and detect code-speed and code-size regressions across all toolchain components. > Does it run like once a day/some-time-period, and if you note a regression, > narrow it down? Configurations that track master branches have 3-day intervals. Configurations that track release branches — 6 days. If a regression is detected it is narrowed down to component first — binutils, gcc or glibc — and then the commit range of the component is bisected down to a specific commit. All. Done. Automatically. I will make a presentation on this CI at the next GNU Tools Cauldron. > Regardless, I think it could be very useful to be able to see the results of > anything you do run at whatever frequency it happens. Thanks! -- Maxim Kuvyrkov https://www.linaro.org ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
Also, it slightly increases code size of 450.soplex at -Os -flto: https://lists.linaro.org/pipermail/linaro-toolchain/2021-September/007883.html -- Maxim Kuvyrkov https://www.linaro.org > On 27 Sep 2021, at 15:53, Maxim Kuvyrkov wrote: > > Hi Aldy, > > Your patch seems to slow down 471.omnetpp by 8% at -O3. Could you please > take a look if this is something that could be easily fixed? > > Regards, > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote: >> >> After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 >> Author: Aldy Hernandez >> >> Avoid invalid loop transformations in jump threading registry. >> >> the following benchmarks slowed down by more than 2%: >> - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples >> >> Below reproducer instructions can be used to re-build both "first_bad" and >> "last_good" cross-toolchains used in this bisection. Naturally, the scripts >> will fail when triggerring benchmarking jobs if you don't have access to >> Linaro TCWG CI. >> >> For your convenience, we have uploaded tarballs with pre-processed source >> and assembly files at: >> - First_bad save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/ >> - Last_good save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/ >> - Baseline save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/ >> >> Configuration: >> - Benchmark: SPEC CPU2006 >> - Toolchain: GCC + Glibc + GNU Linker >> - Version: all components were built from their tip of trunk >> - Target: arm-linux-gnueabihf >> - Compiler flags: -O3 -marm >> - Hardware: NVidia TK1 4x Cortex-A15 >> >> This benchmarking CI is work-in-progress, and we welcome feedback and >> suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans >> is to add support for SPEC CPU2017 benchmarks and provide "perf >> report/annotate" data behind these reports. >> >> THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, >> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. >> >> This commit has regressed these CI configurations: >> - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 >> >> First_bad build: >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/ >> Last_good build: >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/ >> Baseline build: >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/ >> Even more details: >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/ >> >> Reproduce builds: >> >> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 >> cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 >> >> # Fetch scripts >> git clone https://git.linaro.org/toolchain/jenkins-scripts >> >> # Fetch manifests and test.sh script >> mkdir -p artifacts/manifests >> curl -o artifacts/manifests/build-baseline.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh >> --fail >> curl -o artifacts/manifests/build-parameters.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh >> --fail >> curl -o artifacts/test.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh >> --fail >> chmod +x artifacts/test.sh >> >> # Reproduce the baseline build (build all pre-requisites) >> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh >> >> # Save baseline build state (which is then restored in artifacts/test.sh) >> mkdir -p ./bisect >> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ >> --exclude /gcc/ ./ ./bisect/baseline/ >> >> cd gcc >> >> # Reproduce first_bad build >> git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 >> ../artifacts/test.sh >> >> # Reproduce last_good build >> git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a >> ../artifacts/test.sh >> >> cd .. >> >> >> Full commit (up to 1000 lines): >> >> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 >> Author: Aldy Hernandez >> Date: Thu Sep 23 10:59:24 2021 +0200 >> >> Avoid invalid loop transformations in jump threading registry. >> >> My upcoming
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
Hi Aldy, Your patch seems to slow down 471.omnetpp by 8% at -O3. Could you please take a look if this is something that could be easily fixed? Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote: > > After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 > Author: Aldy Hernandez > >Avoid invalid loop transformations in jump threading registry. > > the following benchmarks slowed down by more than 2%: > - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples > > Below reproducer instructions can be used to re-build both "first_bad" and > "last_good" cross-toolchains used in this bisection. Naturally, the scripts > will fail when triggerring benchmarking jobs if you don't have access to > Linaro TCWG CI. > > For your convenience, we have uploaded tarballs with pre-processed source and > assembly files at: > - First_bad save-temps: > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/ > - Last_good save-temps: > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/ > - Baseline save-temps: > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/ > > Configuration: > - Benchmark: SPEC CPU2006 > - Toolchain: GCC + Glibc + GNU Linker > - Version: all components were built from their tip of trunk > - Target: arm-linux-gnueabihf > - Compiler flags: -O3 -marm > - Hardware: NVidia TK1 4x Cortex-A15 > > This benchmarking CI is work-in-progress, and we welcome feedback and > suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans > is to add support for SPEC CPU2017 benchmarks and provide "perf > report/annotate" data behind these reports. > > THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, > REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. > > This commit has regressed these CI configurations: > - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 > > First_bad build: > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/ > Last_good build: > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/ > Baseline build: > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/ > Even more details: > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/ > > Reproduce builds: > > mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 > cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh > --fail > curl -o artifacts/manifests/build-parameters.sh > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh > --fail > curl -o artifacts/test.sh > https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh > --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ > --exclude /gcc/ ./ ./bisect/baseline/ > > cd gcc > > # Reproduce first_bad build > git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a > ../artifacts/test.sh > > cd .. > > > Full commit (up to 1000 lines): > > commit 4a960d548b7d7d942f316c5295f6d849b74214f5 > Author: Aldy Hernandez > Date: Thu Sep 23 10:59:24 2021 +0200 > >Avoid invalid loop transformations in jump threading registry. > >My upcoming improvements to the forward jump threader make it thread >more aggressively. In investigating some "regressions", I noticed >that it has always allowed threading through empty latches and across >loop boundaries. As we have discussed recently, this should be avoided >until after loop optimizations have run their course. > >Note that this wasn't much
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
On 9/29/21 7:59 AM, Maxim Kuvyrkov wrote: Does it run like once a day/some-time-period, and if you note a regression, narrow it down? Configurations that track master branches have 3-day intervals. Configurations that track release branches — 6 days. If a regression is detected it is narrowed down to component first — binutils, gcc or glibc — and then the commit range of the component is bisected down to a specific commit. All. Done. Automatically. I will make a presentation on this CI at the next GNU Tools Cauldron. Regardless, I think it could be very useful to be able to see the results of anything you do run at whatever frequency it happens. Thanks! -- One more follow on question.. is this information/summary of the results every 3rd day interval of master published anywhere? ie, to a web page or posted somewhere? that seems like it could useful, especially with a +/- differential from the previous run (which you obviously calculate to determine if there is a regression). Anyway, I like it! Andrew ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
[CCing Jeff and list for broader audience] On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote: Hi Aldy, Your patch seems to slow down 471.omnetpp by 8% at -O3. Could you please take a look if this is something that could be easily fixed? First of all, thanks for chasing this down. It's incredibly useful to have these types of bug reports. Jeff and I have been discussing the repercussions of adjusting the loop crossing restrictions in the various threaders. He's seen some regressions in embedded targets when disallowing certain corner cases of loop crossing threads causes all sorts of grief. Out of curiosity, does the attached (untested) patch fix the regression? Aldy Regards, -- Maxim Kuvyrkov https://www.linaro.org On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote: After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez Avoid invalid loop transformations in jump threading registry. the following benchmarks slowed down by more than 2%: - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/ - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/ - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/ Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O3 -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/ Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/ Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/ Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/ Reproduce builds: mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. Full commit (up to 1000 lines): commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump
[TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.
After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez Avoid invalid loop transformations in jump threading registry. the following benchmarks slowed down by more than 2%: - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/ - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/ - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/ Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O3 -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/ Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/ Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/ Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/ Reproduce builds: mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. Full commit (up to 1000 lines): commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward)