Hi Arthur,

Thanks for looking into this!

The flags to compile regexec.c were:
-O3 --target=aarch64-linux-gnu -fgnu89-inline

Clang was configured with (on x86_64-linux-gnu host):
cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' 
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True 
-DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64

Please let me know if the above doesn’t work for you.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 29 Sep 2021, at 20:47, Arthur Eubanks <aeuba...@google.com> wrote:
> 
> Do you know the flags passed to Clang to compile the sources? I tried 
> compiling the preprocessed sources but ran into the below, and couldn't find 
> the flags in any of the logs.
> 
> In file included from regexec.c:93:
> In file included from ./perl.h:384:
> In file included from 
> /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144:
> /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27:
>  error: typedef redefinition with different types ('unsigned long' vs 
> 'unsigned long long')
> typedef long unsigned int size_t;
>                           ^
> 1 error generated.
> 
> 
> 
> And yeah just moving the code around could cause major performance 
> regressions, I've had other patches do the same for various benchmarks, 
> there's not much we can do about that if that's actually the root cause. If I 
> can compile the file I can check if the optimization actually created worse 
> IR or not.
> 
> 
> On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
> wrote:
> Hi Arthur,
> 
> Pre-processed source is in the save-temps tarballs linked below; S_regmatch() 
> is in regexec.i .
> 
> The save-temps also have .s assembly file for before and after your patch, 
> and the only code-gen difference is in S_reginclass() function — see the 
> attached screenshot #1.
> 
> Looking into profile of S_regmatch(), some of the extra cycles come from hot 
> loop starting with “cbz w19,...” getting misaligned — before your patch it 
> was starting at "2bce10", and after it starts at "2bce6c”.
> 
> Maybe the added instructions in S_reginclass() pushed the loop in 
> S_regmatch() in an unfortunate way?
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
>> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeuba...@google.com> wrote:
>> 
>> Could I get the source file with S_regmatch()?
>> 
>> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
>> wrote:
>> Hi Arthur,
>> 
>> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down 
>> of its hot function S_regmatch() by 14%.
>> 
>> Could you take a look if this is easily fixable, please?
>> 
>> Regards,
>> 
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>> 
>> > On 24 Sep 2021, at 15:07, ci_not...@linaro.org wrote:
>> > 
>> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc
>> > Author: Arthur Eubanks <aeuba...@google.com>
>> > 
>> >    [SimplifyCFG] Ignore free instructions when computing cost for folding 
>> > branch to common dest
>> > 
>> > the following benchmarks slowed down by more than 2%:
>> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples
>> >  - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf 
>> > samples
>> > 
>> > Below reproducer instructions can be used to re-build both "first_bad" and 
>> > "last_good" cross-toolchains used in this bisection.  Naturally, the 
>> > scripts will fail when triggerring benchmarking jobs if you don't have 
>> > access to Linaro TCWG CI.
>> > 
>> > For your convenience, we have uploaded tarballs with pre-processed source 
>> > and assembly files at:
>> > - First_bad save-temps: 
>> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc/save-temps/
>> > - Last_good save-temps: 
>> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-32a50078657dd8beead327a3478ede4e9d730432/save-temps/
>> > - Baseline save-temps: 
>> > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-baseline/save-temps/
>> > 
>> > Configuration:
>> > - Benchmark: SPEC CPU2006
>> > - Toolchain: Clang + Glibc + LLVM Linker
>> > - Version: all components were built from their tip of trunk
>> > - Target: aarch64-linux-gnu
>> > - Compiler flags: -O3
>> > - Hardware: NVidia TX1 4x Cortex-A57
>> > 
>> > This benchmarking CI is work-in-progress, and we welcome feedback and 
>> > suggestions at linaro-toolchain@lists.linaro.org .  In our improvement 
>> > plans is to add support for SPEC CPU2017 benchmarks and provide "perf 
>> > report/annotate" data behind these reports.
> 
> <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png>

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to