Hi Alexey, After your patch Clang crashes while building 453.povray for aarch64-linux-gnu. Apparently, this happens only with LTO enabled at -O2 and -O3.
Did you get any bug reports against this patch already? Thanks, -- Maxim Kuvyrkov https://www.linaro.org > On 5 Dec 2021, at 02:55, ci_not...@linaro.org wrote: > > After llvm commit ba74bb3a226e1b4660537f274627285b1bf41ee1 > Author: Alexey Bataev <a.bat...@outlook.com> > > [SLP]Fix reused extracts cost. > > the following benchmarks slowed down by more than 2%: > - 453.povray failed to build > > Below reproducer instructions can be used to re-build both "first_bad" and > "last_good" cross-toolchains used in this bisection. Naturally, the scripts > will fail when triggerring benchmarking jobs if you don't have access to > Linaro TCWG CI. > > For your convenience, we have uploaded tarballs with pre-processed source and > assembly files at: > - First_bad save-temps: > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/save-temps/ > - Last_good save-temps: > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/save-temps/ > - Baseline save-temps: > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/save-temps/ > > Configuration: > - Benchmark: SPEC CPU2006 > - Toolchain: Clang + Glibc + LLVM Linker > - Version: all components were built from their tip of trunk > - Target: aarch64-linux-gnu > - Compiler flags: -O3 -flto > - Hardware: NVidia TX1 4x Cortex-A57 > > This benchmarking CI is work-in-progress, and we welcome feedback and > suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans > is to add support for SPEC CPU2017 benchmarks and provide "perf > report/annotate" data behind these reports. > > THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, > REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO > > First_bad build: > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/ > Last_good build: > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/ > Baseline build: > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/ > Even more details: > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/ > > Reproduce builds: > <cut> > mkdir investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1 > cd investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-baseline.sh > --fail > curl -o artifacts/manifests/build-parameters.sh > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-parameters.sh > --fail > curl -o artifacts/test.sh > https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/test.sh > --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ > --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach ba74bb3a226e1b4660537f274627285b1bf41ee1 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach 78cc133c63173a4b5b7a43750cc507d4cff683cf > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit ba74bb3a226e1b4660537f274627285b1bf41ee1 > Author: Alexey Bataev <a.bat...@outlook.com> > Date: Thu Dec 2 04:22:55 2021 -0800 > > [SLP]Fix reused extracts cost. > > If the extractelement instruction is used multiple times in the > different tree entries (either vectorized, or gathered), need to > compensate the scalar cost of such instructions. They are completely > removed if all users are part of the tree but we need to compensate the > cost only once for each instruction. > > Differential Revision: https://reviews.llvm.org/D114958 > --- > llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 29 +++++++++++++--------- > .../X86/extractelement-multiple-uses.ll | 23 +++++++++-------- > 2 files changed, 29 insertions(+), 23 deletions(-) > > diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp > b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp > index 95061e9053fa..335ad6c85387 100644 > --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp > +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp > @@ -4287,8 +4287,8 @@ bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, > Value *OpValue, > bool BoUpSLP::areAllUsersVectorized(Instruction *I, > ArrayRef<Value *> VectorizedVals) const { > return (I->hasOneUse() && is_contained(VectorizedVals, I)) || > - llvm::all_of(I->users(), [this](User *U) { > - return ScalarToTreeEntry.count(U) > 0; > + all_of(I->users(), [this](User *U) { > + return ScalarToTreeEntry.count(U) > 0 || MustGather.contains(U); > }); > } > > @@ -4442,9 +4442,9 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry > *E, > // FIXME: it tries to fix a problem with MSVC buildbots. > TargetTransformInfo &TTIRef = *TTI; > auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy, > - VectorizedVals](InstructionCost &Cost, > - bool IsGather) { > + VectorizedVals, E](InstructionCost &Cost) { > DenseMap<Value *, int> ExtractVectorsTys; > + SmallPtrSet<Value *, 4> CheckedExtracts; > for (auto *V : VL) { > if (isa<UndefValue>(V)) > continue; > @@ -4452,7 +4452,12 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry > *E, > // instruction itself is not going to be vectorized, consider this > // instruction as dead and remove its cost from the final cost of the > // vectorized tree. > - if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals)) > + // Also, avoid adjusting the cost for extractelements with multiple > uses > + // in different graph entries. > + const TreeEntry *VE = getTreeEntry(V); > + if (!CheckedExtracts.insert(V).second || > + !areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) || > + (VE && VE != E)) > continue; > auto *EE = cast<ExtractElementInst>(V); > Optional<unsigned> EEIdx = getExtractIndex(EE); > @@ -4549,11 +4554,6 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry > *E, > } > return GatherCost; > } > - if (isSplat(VL)) { > - // Found the broadcasting of the single scalar, calculate the cost as > the > - // broadcast. > - return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy); > - } > if ((E->getOpcode() == Instruction::ExtractElement || > all_of(E->Scalars, > [](Value *V) { > @@ -4571,13 +4571,18 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry > *E, > // single input vector or of 2 input vectors. > InstructionCost Cost = > computeExtractCost(VL, VecTy, *ShuffleKind, Mask, *TTI); > - AdjustExtractsCost(Cost, /*IsGather=*/true); > + AdjustExtractsCost(Cost); > if (NeedToShuffleReuses) > Cost += > TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, > FinalVecTy, E->ReuseShuffleIndices); > return Cost; > } > } > + if (isSplat(VL)) { > + // Found the broadcasting of the single scalar, calculate the cost as > the > + // broadcast. > + return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy); > + } > InstructionCost ReuseShuffleCost = 0; > if (NeedToShuffleReuses) > ReuseShuffleCost = TTI->getShuffleCost( > @@ -4755,7 +4760,7 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry > *E, > TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I); > } > } else { > - AdjustExtractsCost(CommonCost, /*IsGather=*/false); > + AdjustExtractsCost(CommonCost); > } > return CommonCost; > } > diff --git > a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll > b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll > index c47f255f0bfe..31696752bbb3 100644 > --- a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll > +++ b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll > @@ -2,24 +2,25 @@ > ; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux > -march=core-avx2 -pass-remarks-output=%t | FileCheck %s > ; RUN: FileCheck %s --input-file=%t --check-prefix=YAML > > -; YAML: --- !Missed > +; YAML: --- !Passed > ; YAML: Pass: slp-vectorizer > -; YAML: Name: NotBeneficial > +; YAML: Name: VectorizedList > ; YAML: Function: multi_uses > ; YAML: Args: > -; YAML: - String: 'List vectorization was possible but not > beneficial with cost ' > -; YAML: - Cost: '0' > -; YAML: - String: ' >= ' > -; YAML: - Treshold: '0' > +; YAML: - String: 'SLP vectorized with cost ' > +; YAML: - Cost: '-1' > +; YAML: - String: ' and with tree size ' > +; YAML: - TreeSize: '3' > > define float @multi_uses(<2 x float> %x, <2 x float> %y) { > ; CHECK-LABEL: @multi_uses( > -; CHECK-NEXT: [[X0:%.*]] = extractelement <2 x float> [[X:%.*]], i32 0 > -; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1 > ; CHECK-NEXT: [[Y1:%.*]] = extractelement <2 x float> [[Y:%.*]], i32 1 > -; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[Y1]] > -; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[Y1]] > -; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]] > +; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float > [[Y1]], i32 0 > +; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float > [[Y1]], i32 1 > +; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[X:%.*]], [[TMP2]] > +; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0 > +; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1 > +; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]] > ; CHECK-NEXT: ret float [[ADD]] > ; > %x0 = extractelement <2 x float> %x, i32 0 > </cut> _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain