Hi Alexey,

After your patch Clang crashes while building 453.povray for aarch64-linux-gnu. 
 Apparently, this happens only with LTO enabled at -O2 and -O3.

Did you get any bug reports against this patch already?

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 5 Dec 2021, at 02:55, ci_not...@linaro.org wrote:
> 
> After llvm commit ba74bb3a226e1b4660537f274627285b1bf41ee1
> Author: Alexey Bataev <a.bat...@outlook.com>
> 
>    [SLP]Fix reused extracts cost.
> 
> the following benchmarks slowed down by more than 2%:
> - 453.povray failed to build
> 
> Below reproducer instructions can be used to re-build both "first_bad" and 
> "last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
> will fail when triggerring benchmarking jobs if you don't have access to 
> Linaro TCWG CI.
> 
> For your convenience, we have uploaded tarballs with pre-processed source and 
> assembly files at:
> - First_bad save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/save-temps/
> - Last_good save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/save-temps/
> - Baseline save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/save-temps/
> 
> Configuration:
> - Benchmark: SPEC CPU2006
> - Toolchain: Clang + Glibc + LLVM Linker
> - Version: all components were built from their tip of trunk
> - Target: aarch64-linux-gnu
> - Compiler flags: -O3 -flto
> - Hardware: NVidia TX1 4x Cortex-A57
> 
> This benchmarking CI is work-in-progress, and we welcome feedback and 
> suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans 
> is to add support for SPEC CPU2017 benchmarks and provide "perf 
> report/annotate" data behind these reports.
> 
> THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, 
> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
> 
> This commit has regressed these CI configurations:
> - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO
> 
> First_bad build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/
> Last_good build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/
> Baseline build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/
> Even more details: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/
> 
> Reproduce builds:
> <cut>
> mkdir investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1
> cd investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1
> 
> # Fetch scripts
> git clone https://git.linaro.org/toolchain/jenkins-scripts
> 
> # Fetch manifests and test.sh script
> mkdir -p artifacts/manifests
> curl -o artifacts/manifests/build-baseline.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-baseline.sh
>  --fail
> curl -o artifacts/manifests/build-parameters.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-parameters.sh
>  --fail
> curl -o artifacts/test.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/test.sh
>  --fail
> chmod +x artifacts/test.sh
> 
> # Reproduce the baseline build (build all pre-requisites)
> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
> 
> # Save baseline build state (which is then restored in artifacts/test.sh)
> mkdir -p ./bisect
> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
> --exclude /llvm/ ./ ./bisect/baseline/
> 
> cd llvm
> 
> # Reproduce first_bad build
> git checkout --detach ba74bb3a226e1b4660537f274627285b1bf41ee1
> ../artifacts/test.sh
> 
> # Reproduce last_good build
> git checkout --detach 78cc133c63173a4b5b7a43750cc507d4cff683cf
> ../artifacts/test.sh
> 
> cd ..
> </cut>
> 
> Full commit (up to 1000 lines):
> <cut>
> commit ba74bb3a226e1b4660537f274627285b1bf41ee1
> Author: Alexey Bataev <a.bat...@outlook.com>
> Date:   Thu Dec 2 04:22:55 2021 -0800
> 
>    [SLP]Fix reused extracts cost.
> 
>    If the extractelement instruction is used multiple times in the
>    different tree entries (either vectorized, or gathered), need to
>    compensate the scalar cost of such instructions. They are completely
>    removed if all users are part of the tree but we need to compensate the
>    cost only once for each instruction.
> 
>    Differential Revision: https://reviews.llvm.org/D114958
> ---
> llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp    | 29 +++++++++++++---------
> .../X86/extractelement-multiple-uses.ll            | 23 +++++++++--------
> 2 files changed, 29 insertions(+), 23 deletions(-)
> 
> diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
> b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> index 95061e9053fa..335ad6c85387 100644
> --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> @@ -4287,8 +4287,8 @@ bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, 
> Value *OpValue,
> bool BoUpSLP::areAllUsersVectorized(Instruction *I,
>                                     ArrayRef<Value *> VectorizedVals) const {
>   return (I->hasOneUse() && is_contained(VectorizedVals, I)) ||
> -         llvm::all_of(I->users(), [this](User *U) {
> -           return ScalarToTreeEntry.count(U) > 0;
> +         all_of(I->users(), [this](User *U) {
> +           return ScalarToTreeEntry.count(U) > 0 || MustGather.contains(U);
>          });
> }
> 
> @@ -4442,9 +4442,9 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
> *E,
>   // FIXME: it tries to fix a problem with MSVC buildbots.
>   TargetTransformInfo &TTIRef = *TTI;
>   auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,
> -                               VectorizedVals](InstructionCost &Cost,
> -                                               bool IsGather) {
> +                               VectorizedVals, E](InstructionCost &Cost) {
>     DenseMap<Value *, int> ExtractVectorsTys;
> +    SmallPtrSet<Value *, 4> CheckedExtracts;
>     for (auto *V : VL) {
>       if (isa<UndefValue>(V))
>         continue;
> @@ -4452,7 +4452,12 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
> *E,
>       // instruction itself is not going to be vectorized, consider this
>       // instruction as dead and remove its cost from the final cost of the
>       // vectorized tree.
> -      if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals))
> +      // Also, avoid adjusting the cost for extractelements with multiple 
> uses
> +      // in different graph entries.
> +      const TreeEntry *VE = getTreeEntry(V);
> +      if (!CheckedExtracts.insert(V).second ||
> +          !areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) ||
> +          (VE && VE != E))
>         continue;
>       auto *EE = cast<ExtractElementInst>(V);
>       Optional<unsigned> EEIdx = getExtractIndex(EE);
> @@ -4549,11 +4554,6 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
> *E,
>       }
>       return GatherCost;
>     }
> -    if (isSplat(VL)) {
> -      // Found the broadcasting of the single scalar, calculate the cost as 
> the
> -      // broadcast.
> -      return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
> -    }
>     if ((E->getOpcode() == Instruction::ExtractElement ||
>          all_of(E->Scalars,
>                 [](Value *V) {
> @@ -4571,13 +4571,18 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
> *E,
>         // single input vector or of 2 input vectors.
>         InstructionCost Cost =
>             computeExtractCost(VL, VecTy, *ShuffleKind, Mask, *TTI);
> -        AdjustExtractsCost(Cost, /*IsGather=*/true);
> +        AdjustExtractsCost(Cost);
>         if (NeedToShuffleReuses)
>           Cost += 
> TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
>                                       FinalVecTy, E->ReuseShuffleIndices);
>         return Cost;
>       }
>     }
> +    if (isSplat(VL)) {
> +      // Found the broadcasting of the single scalar, calculate the cost as 
> the
> +      // broadcast.
> +      return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
> +    }
>     InstructionCost ReuseShuffleCost = 0;
>     if (NeedToShuffleReuses)
>       ReuseShuffleCost = TTI->getShuffleCost(
> @@ -4755,7 +4760,7 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
> *E,
>               TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);
>         }
>       } else {
> -        AdjustExtractsCost(CommonCost, /*IsGather=*/false);
> +        AdjustExtractsCost(CommonCost);
>       }
>       return CommonCost;
>     }
> diff --git 
> a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll 
> b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll
> index c47f255f0bfe..31696752bbb3 100644
> --- a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll
> +++ b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll
> @@ -2,24 +2,25 @@
> ; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux 
> -march=core-avx2 -pass-remarks-output=%t | FileCheck %s
> ; RUN: FileCheck %s --input-file=%t --check-prefix=YAML
> 
> -; YAML: --- !Missed
> +; YAML: --- !Passed
> ; YAML: Pass:            slp-vectorizer
> -; YAML: Name:            NotBeneficial
> +; YAML: Name:            VectorizedList
> ; YAML: Function:        multi_uses
> ; YAML: Args:
> -; YAML:  - String:          'List vectorization was possible but not 
> beneficial with cost '
> -; YAML:  - Cost:            '0'
> -; YAML:  - String:          ' >= '
> -; YAML:  - Treshold:        '0'
> +; YAML:  - String:          'SLP vectorized with cost '
> +; YAML:  - Cost:            '-1'
> +; YAML:  - String:          ' and with tree size '
> +; YAML:  - TreeSize:        '3'
> 
> define float @multi_uses(<2 x float> %x, <2 x float> %y) {
> ; CHECK-LABEL: @multi_uses(
> -; CHECK-NEXT:    [[X0:%.*]] = extractelement <2 x float> [[X:%.*]], i32 0
> -; CHECK-NEXT:    [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1
> ; CHECK-NEXT:    [[Y1:%.*]] = extractelement <2 x float> [[Y:%.*]], i32 1
> -; CHECK-NEXT:    [[X0X0:%.*]] = fmul float [[X0]], [[Y1]]
> -; CHECK-NEXT:    [[X1X1:%.*]] = fmul float [[X1]], [[Y1]]
> -; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
> +; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x float> poison, float 
> [[Y1]], i32 0
> +; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float 
> [[Y1]], i32 1
> +; CHECK-NEXT:    [[TMP3:%.*]] = fmul <2 x float> [[X:%.*]], [[TMP2]]
> +; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
> +; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
> +; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]]
> ; CHECK-NEXT:    ret float [[ADD]]
> ;
>   %x0 = extractelement <2 x float> %x, i32 0
> </cut>

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to