Hi Arthur, Thanks for looking into this!
The flags to compile regexec.c were: -O3 --target=aarch64-linux-gnu -fgnu89-inline Clang was configured with (on x86_64-linux-gnu host): cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64 Please let me know if the above doesn’t work for you. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 29 Sep 2021, at 20:47, Arthur Eubanks <aeuba...@google.com> wrote: > > Do you know the flags passed to Clang to compile the sources? I tried > compiling the preprocessed sources but ran into the below, and couldn't find > the flags in any of the logs. > > In file included from regexec.c:93: > In file included from ./perl.h:384: > In file included from > /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144: > /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27: > error: typedef redefinition with different types ('unsigned long' vs > 'unsigned long long') > typedef long unsigned int size_t; > ^ > 1 error generated. > > > > And yeah just moving the code around could cause major performance > regressions, I've had other patches do the same for various benchmarks, > there's not much we can do about that if that's actually the root cause. If I > can compile the file I can check if the optimization actually created worse > IR or not. > > > On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> > wrote: > Hi Arthur, > > Pre-processed source is in the save-temps tarballs linked below; S_regmatch() > is in regexec.i . > > The save-temps also have .s assembly file for before and after your patch, > and the only code-gen difference is in S_reginclass() function — see the > attached screenshot #1. > > Looking into profile of S_regmatch(), some of the extra cycles come from hot > loop starting with “cbz w19,...” getting misaligned — before your patch it > was starting at "2bce10", and after it starts at "2bce6c”. > > Maybe the added instructions in S_reginclass() pushed the loop in > S_regmatch() in an unfortunate way? > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeuba...@google.com> wrote: >> >> Could I get the source file with S_regmatch()? >> >> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> >> wrote: >> Hi Arthur, >> >> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down >> of its hot function S_regmatch() by 14%. >> >> Could you take a look if this is easily fixable, please? >> >> Regards, >> >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >> >> > On 24 Sep 2021, at 15:07, ci_not...@linaro.org wrote: >> > >> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc >> > Author: Arthur Eubanks <aeuba...@google.com> >> > >> > [SimplifyCFG] Ignore free instructions when computing cost for folding >> > branch to common dest >> > >> > the following benchmarks slowed down by more than 2%: >> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples >> > - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf >> > samples >> > >> > Below reproducer instructions can be used to re-build both "first_bad" and >> > "last_good" cross-toolchains used in this bisection. Naturally, the >> > scripts will fail when triggerring benchmarking jobs if you don't have >> > access to Linaro TCWG CI. >> > >> > For your convenience, we have uploaded tarballs with pre-processed source >> > and assembly files at: >> > - First_bad save-temps: >> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc/save-temps/ >> > - Last_good save-temps: >> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-32a50078657dd8beead327a3478ede4e9d730432/save-temps/ >> > - Baseline save-temps: >> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-baseline/save-temps/ >> > >> > Configuration: >> > - Benchmark: SPEC CPU2006 >> > - Toolchain: Clang + Glibc + LLVM Linker >> > - Version: all components were built from their tip of trunk >> > - Target: aarch64-linux-gnu >> > - Compiler flags: -O3 >> > - Hardware: NVidia TX1 4x Cortex-A57 >> > >> > This benchmarking CI is work-in-progress, and we welcome feedback and >> > suggestions at linaro-toolchain@lists.linaro.org . In our improvement >> > plans is to add support for SPEC CPU2017 benchmarks and provide "perf >> > report/annotate" data behind these reports. > > <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png> _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain