On Fri, Oct 1, 2021 at 1:53 PM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org>
wrote:

> > On 1 Oct 2021, at 23:37, Arthur Eubanks <aeuba...@google.com> wrote:
> >
> > Thanks for the flags, I can now reproduce.
> >
> > I've basically come to the same conclusion as you. There's only one new
> instance of this optimization triggering throughout the whole file, in
> S_reginclass(). It doesn't look out of place,
>
> I was looking at assembly of S_reginclass() and couldn’t figure out the
> purpose of the extra tst/b.ne instructions.  Sure it’s a tiny code-size
> increase, but their appearance seems silly (or, quite likely, I just don’t
> understand something).
>
> Do you know why they are now generated?
>
It's duplicated from .LBB11_29 due to the new instance of the branch
folding that happens with my patch. We might end up doing one less branch
if we end up getting to .LBB11_36, so the new instructions don't seem
useless.

before my patch
.LBB11_13:
  ldr
  str
  b 11_29
.LBB11_29:
  tst
  b.eq 11_36
  ldrb
  tst
  b.eq 11_33

after my patch
.LBB11_13:
  tst
  str
  b.ne 11_30
  b 11_36
.LBB11_29:
  tst
  b.eq 11_36
  ; fallthrough to 11_30
.LBB11_30:
  ldrb
  tst
  b.eq 11_33

>
> > and S_regmatch() is identical before and after the patch. So it's likely
> alignment as you've already mentioned.
> >
> > There's not much I can do with that information.
> > To further confirm if this is the case, you could try passing -mllvm
> -align-all-functions=8 (or -align-all-blocks=4, or something along those
> lines) at head vs with my patch reverted and see how the performance is.
>
> Thanks,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
>
> >
> > On Fri, Oct 1, 2021 at 2:41 AM Maxim Kuvyrkov <maxim.kuvyr...@linaro.org>
> wrote:
> > Hi Arthur,
> >
> > Thanks for looking into this!
> >
> > The flags to compile regexec.c were:
> > -O3 --target=aarch64-linux-gnu -fgnu89-inline
> >
> > Clang was configured with (on x86_64-linux-gnu host):
> > cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld'
> -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True
> -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64
> >
> > Please let me know if the above doesn’t work for you.
> >
> > Regards,
> >
> > --
> > Maxim Kuvyrkov
> > https://www.linaro.org
> >
> > > On 29 Sep 2021, at 20:47, Arthur Eubanks <aeuba...@google.com> wrote:
> > >
> > > Do you know the flags passed to Clang to compile the sources? I tried
> compiling the preprocessed sources but ran into the below, and couldn't
> find the flags in any of the logs.
> > >
> > > In file included from regexec.c:93:
> > > In file included from ./perl.h:384:
> > > In file included from
> /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144:
> > >
> /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27:
> error: typedef redefinition with different types ('unsigned long' vs
> 'unsigned long long')
> > > typedef long unsigned int size_t;
> > >                           ^
> > > 1 error generated.
> > >
> > >
> > >
> > > And yeah just moving the code around could cause major performance
> regressions, I've had other patches do the same for various benchmarks,
> there's not much we can do about that if that's actually the root cause. If
> I can compile the file I can check if the optimization actually created
> worse IR or not.
> > >
> > >
> > > On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <
> maxim.kuvyr...@linaro.org> wrote:
> > > Hi Arthur,
> > >
> > > Pre-processed source is in the save-temps tarballs linked below;
> S_regmatch() is in regexec.i .
> > >
> > > The save-temps also have .s assembly file for before and after your
> patch, and the only code-gen difference is in S_reginclass() function — see
> the attached screenshot #1.
> > >
> > > Looking into profile of S_regmatch(), some of the extra cycles come
> from hot loop starting with “cbz w19,...” getting misaligned — before your
> patch it was starting at "2bce10", and after it starts at "2bce6c”.
> > >
> > > Maybe the added instructions in S_reginclass() pushed the loop in
> S_regmatch() in an unfortunate way?
> > >
> > > --
> > > Maxim Kuvyrkov
> > > https://www.linaro.org
> > >
> > >> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeuba...@google.com> wrote:
> > >>
> > >> Could I get the source file with S_regmatch()?
> > >>
> > >> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <
> maxim.kuvyr...@linaro.org> wrote:
> > >> Hi Arthur,
> > >>
> > >> Your patch seems to be slowing down 400.perlbench by 6% — due to slow
> down of its hot function S_regmatch() by 14%.
> > >>
> > >> Could you take a look if this is easily fixable, please?
> > >>
> > >> Regards,
> > >>
> > >> --
> > >> Maxim Kuvyrkov
> > >> https://www.linaro.org
> > >>
> > >> > On 24 Sep 2021, at 15:07, ci_not...@linaro.org wrote:
> > >> >
> > >> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc
> > >> > Author: Arthur Eubanks <aeuba...@google.com>
> > >> >
> > >> >    [SimplifyCFG] Ignore free instructions when computing cost for
> folding branch to common dest
> > >> >
> > >> > the following benchmarks slowed down by more than 2%:
> > >> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples
> > >> >  - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to
> 4188 perf samples
> > >> >
> > >> > Below reproducer instructions can be used to re-build both
> "first_bad" and "last_good" cross-toolchains used in this bisection.
> Naturally, the scripts will fail when triggerring benchmarking jobs if you
> don't have access to Linaro TCWG CI.
> > >> >
> > >> > For your convenience, we have uploaded tarballs with pre-processed
> source and assembly files at:
> > >> > - First_bad save-temps:
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc/save-temps/
> > >> > - Last_good save-temps:
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-32a50078657dd8beead327a3478ede4e9d730432/save-temps/
> > >> > - Baseline save-temps:
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/23/artifact/artifacts/build-baseline/save-temps/
> > >> >
> > >> > Configuration:
> > >> > - Benchmark: SPEC CPU2006
> > >> > - Toolchain: Clang + Glibc + LLVM Linker
> > >> > - Version: all components were built from their tip of trunk
> > >> > - Target: aarch64-linux-gnu
> > >> > - Compiler flags: -O3
> > >> > - Hardware: NVidia TX1 4x Cortex-A57
> > >> >
> > >> > This benchmarking CI is work-in-progress, and we welcome feedback
> and suggestions at linaro-toolchain@lists.linaro.org .  In our
> improvement plans is to add support for SPEC CPU2017 benchmarks and provide
> "perf report/annotate" data behind these reports.
> > >
> > > <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png>
> >
>
>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to