Hi Arthur,

Pre-processed source is in the save-temps tarballs linked below; S_regmatch() 
is in regexec.i .

The save-temps also have .s assembly file for before and after your patch, and 
the only code-gen difference is in S_reginclass() function — see the attached 
screenshot #1.

Looking into profile of S_regmatch(), some of the extra cycles come from hot 
loop starting with “cbz w19,...” getting misaligned — before your patch it was 
starting at "2bce10", and after it starts at "2bce6c”.

Maybe the added instructions in S_reginclass() pushed the loop in S_regmatch() 
in an unfortunate way?

--
Maxim Kuvyrkov
https://www.linaro.org

> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeuba...@google.com> wrote:
> 
> Could I get the source file with S_regmatch()?
> 
> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
> wrote:
> Hi Arthur,
> 
> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down of 
> its hot function S_regmatch() by 14%.
> 
> Could you take a look if this is easily fixable, please?
> 
> Regards,
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
> > On 24 Sep 2021, at 15:07, ci_not...@linaro.org wrote:
> > 
> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc
> > Author: Arthur Eubanks <aeuba...@google.com>
> > 
> >    [SimplifyCFG] Ignore free instructions when computing cost for folding 
> > branch to common dest
> > 
> > the following benchmarks slowed down by more than 2%:
> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples
> >  - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf 
> > samples
> > 
> > Below reproducer instructions can be used to re-build both "first_bad" and 
> > "last_good" cross-toolchains used in this bisection.  Naturally, the 
> > scripts will fail when triggerring benchmarking jobs if you don't have 
> > access to Linaro TCWG CI.
> > 
> > For your convenience, we have uploaded tarballs with pre-processed source 
> > and assembly files at:
> > - First_bad save-temps: 
> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc/save-temps/
> > - Last_good save-temps: 
> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-32a50078657dd8beead327a3478ede4e9d730432/save-temps/
> > - Baseline save-temps: 
> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-baseline/save-temps/
> > 
> > Configuration:
> > - Benchmark: SPEC CPU2006
> > - Toolchain: Clang + Glibc + LLVM Linker
> > - Version: all components were built from their tip of trunk
> > - Target: aarch64-linux-gnu
> > - Compiler flags: -O3
> > - Hardware: NVidia TX1 4x Cortex-A57
> > 
> > This benchmarking CI is work-in-progress, and we welcome feedback and 
> > suggestions at linaro-toolchain@lists.linaro.org .  In our improvement 
> > plans is to add support for SPEC CPU2017 benchmarks and provide "perf 
> > report/annotate" data behind these reports.

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to