[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
https://github.com/NagyDonat created https://github.com/llvm/llvm-project/pull/89606 Previously the function ``` std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 8 overloaded variants under this name) was handling element regions in a highly inefficent manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would proudce 2^N calls. I suspect that this issue might be behind https://github.com/llvm/llvm-project/issues/89045 (note that `sheervideo.c` does very complex pointer arithmetic). This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls. >From 2581f4858c02a51b88e15f657fc99d035c066bca Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Mon, 22 Apr 2024 15:32:27 +0200 Subject: [PATCH] [analyzer] Fix performance of getTaintedSymbolsImpl() Previously the function ``` std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 8 overloaded variants under this name) was handling element regions in a highly inefficent manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would proudce 2^N calls. I suspect that this issue might be behind https://github.com/llvm/llvm-project/issues/89045 (note that `sheervideo.c` does very complex pointer arithmetic). This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls. --- clang/lib/StaticAnalyzer/Checkers/Taint.cpp | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp index 4edb671753bf45..6362c82b009d72 100644 --- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp @@ -216,21 +216,17 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, std::vector TaintedSymbols; if (!Reg) return TaintedSymbols; - // Element region (array element) is tainted if either the base or the offset - // are tainted. + + // Element region (array element) is tainted if the offset is tainted. if (const ElementRegion *ER = dyn_cast(Reg)) { std::vector TaintedIndex = getTaintedSymbolsImpl(State, ER->getIndex(), K, returnFirstOnly); llvm::append_range(TaintedSymbols, TaintedIndex); if (returnFirstOnly && !TaintedSymbols.empty()) return TaintedSymbols; // return early if needed -std::vector TaintedSuperRegion = -getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly); -llvm::append_range(TaintedSymbols, TaintedSuperRegion); -if (returnFirstOnly && !TaintedSymbols.empty()) - return TaintedSymbols; // return early if needed } + // Symbolic region is tainted if the corresponding symbol is tainted. if (const SymbolicRegion *SR = dyn_cast(Reg)) { std::vector TaintedRegions = getTaintedSymbolsImpl(State, SR->getSymbol(), K, returnFirstOnly); @@ -239,6 +235,8 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, return TaintedSymbols; // return early if needed } + // Any subregion (including Element and Symbolic regions) is tainted if its + // super-region is tainted. if (const SubRegion *ER = dyn_cast(Reg)) { std::vector TaintedSubRegions = getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly); @@ -318,4 +316,4 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, } } return TaintedSymbols; -} \ No newline at end of file +} ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
llvmbot wrote: @llvm/pr-subscribers-clang Author: None (NagyDonat) Changes Previously the function ``` std::vectortaint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 8 overloaded variants under this name) was handling element regions in a highly inefficent manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would proudce 2^N calls. I suspect that this issue might be behind https://github.com/llvm/llvm-project/issues/89045 (note that `sheervideo.c` does very complex pointer arithmetic). This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls. --- Full diff: https://github.com/llvm/llvm-project/pull/89606.diff 1 Files Affected: - (modified) clang/lib/StaticAnalyzer/Checkers/Taint.cpp (+6-8) ``diff diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp index 4edb671753bf45..6362c82b009d72 100644 --- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp @@ -216,21 +216,17 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, std::vector TaintedSymbols; if (!Reg) return TaintedSymbols; - // Element region (array element) is tainted if either the base or the offset - // are tainted. + + // Element region (array element) is tainted if the offset is tainted. if (const ElementRegion *ER = dyn_cast(Reg)) { std::vector TaintedIndex = getTaintedSymbolsImpl(State, ER->getIndex(), K, returnFirstOnly); llvm::append_range(TaintedSymbols, TaintedIndex); if (returnFirstOnly && !TaintedSymbols.empty()) return TaintedSymbols; // return early if needed -std::vector TaintedSuperRegion = -getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly); -llvm::append_range(TaintedSymbols, TaintedSuperRegion); -if (returnFirstOnly && !TaintedSymbols.empty()) - return TaintedSymbols; // return early if needed } + // Symbolic region is tainted if the corresponding symbol is tainted. if (const SymbolicRegion *SR = dyn_cast(Reg)) { std::vector TaintedRegions = getTaintedSymbolsImpl(State, SR->getSymbol(), K, returnFirstOnly); @@ -239,6 +235,8 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, return TaintedSymbols; // return early if needed } + // Any subregion (including Element and Symbolic regions) is tainted if its + // super-region is tainted. if (const SubRegion *ER = dyn_cast(Reg)) { std::vector TaintedSubRegions = getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly); @@ -318,4 +316,4 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, } } return TaintedSymbols; -} \ No newline at end of file +} `` https://github.com/llvm/llvm-project/pull/89606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
https://github.com/dkrupp approved this pull request. The suggested change make a lot of sense. Thanks. LGTM. https://github.com/llvm/llvm-project/pull/89606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
https://github.com/steakhal edited https://github.com/llvm/llvm-project/pull/89606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
https://github.com/steakhal approved this pull request. Thanks! The patch makes sense. Thank you for the promptly fix. I've checked and this resolves the `FFmpeg` issue. i wonder if this is serious enough to consider hangs as a crash and nominate this PR for backport to `clang-18`. Leave me no doubt, we will have it downstream for sure, but upstream users might wanna also have this. @Xazax-hun WDYT? https://github.com/llvm/llvm-project/pull/89606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
steakhal wrote: Ah, I think rushed ahead of myself. I applied the patch to clang-17, where it of course we didn't have any issues even with this broken `isTainted`. Now that I applied the patch to our clang-18 based branch, the file analyzes in 1:40, which is still far off from the baseline run 32 seconds. But 1:40 is arguably a lot better than 1.23 hours. I perfed again, and it shows this: ![image](https://github.com/llvm/llvm-project/assets/6280485/1706b0c7-bd5e-4de2-88f2-e49d801d0e12) This is much better, and showcases the next slowdown bug I'm hunting for the days since I reported this one :D The remaining 1 extra minute is lost during Z3 refutation. So, yea, this PR fixes the bug I reported in #89045, so we can close that once this is merged. And stay tuned for the next slowdown bug. I can already tell you that the bitwise shift checker introduces new constraints, after which we can now refute more bugs! - but also spends more time for finding a valid bug report in an EQClass , while previously we just picked the first one as we found those path constraints `sat`. But I'll come back to that once I have something concrete to share. https://github.com/llvm/llvm-project/pull/89606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
Xazax-hun wrote: > @Xazax-hun WDYT? If it really takes over an hour to analyze a file in clang-18, I'd agree that this has similar severity to a crash, and I support backporting the fix. https://github.com/llvm/llvm-project/pull/89606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)
https://github.com/NagyDonat closed https://github.com/llvm/llvm-project/pull/89606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits