Hi Arthur, Pre-processed source is in the save-temps tarballs linked below; S_regmatch() is in regexec.i .
The save-temps also have .s assembly file for before and after your patch, and the only code-gen difference is in S_reginclass() function — see the attached screenshot #1. Looking into profile of S_regmatch(), some of the extra cycles come from hot loop starting with “cbz w19,...” getting misaligned — before your patch it was starting at "2bce10", and after it starts at "2bce6c”. Maybe the added instructions in S_reginclass() pushed the loop in S_regmatch() in an unfortunate way? -- Maxim Kuvyrkov https://www.linaro.org > On 27 Sep 2021, at 20:05, Arthur Eubanks <aeuba...@google.com> wrote: > > Could I get the source file with S_regmatch()? > > On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> > wrote: > Hi Arthur, > > Your patch seems to be slowing down 400.perlbench by 6% — due to slow down of > its hot function S_regmatch() by 14%. > > Could you take a look if this is easily fixable, please? > > Regards, > > -- > Maxim Kuvyrkov > https://www.linaro.org > > > On 24 Sep 2021, at 15:07, ci_not...@linaro.org wrote: > > > > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc > > Author: Arthur Eubanks <aeuba...@google.com> > > > > [SimplifyCFG] Ignore free instructions when computing cost for folding > > branch to common dest > > > > the following benchmarks slowed down by more than 2%: > > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples > > - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf > > samples > > > > Below reproducer instructions can be used to re-build both "first_bad" and > > "last_good" cross-toolchains used in this bisection. Naturally, the > > scripts will fail when triggerring benchmarking jobs if you don't have > > access to Linaro TCWG CI. > > > > For your convenience, we have uploaded tarballs with pre-processed source > > and assembly files at: > > - First_bad save-temps: > > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc/save-temps/ > > - Last_good save-temps: > > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-32a50078657dd8beead327a3478ede4e9d730432/save-temps/ > > - Baseline save-temps: > > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-baseline/save-temps/ > > > > Configuration: > > - Benchmark: SPEC CPU2006 > > - Toolchain: Clang + Glibc + LLVM Linker > > - Version: all components were built from their tip of trunk > > - Target: aarch64-linux-gnu > > - Compiler flags: -O3 > > - Hardware: NVidia TX1 4x Cortex-A57 > > > > This benchmarking CI is work-in-progress, and we welcome feedback and > > suggestions at linaro-toolchain@lists.linaro.org . In our improvement > > plans is to add support for SPEC CPU2017 benchmarks and provide "perf > > report/annotate" data behind these reports. _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain