[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
DonatNagyE wrote: I'd like to abstain from deciding this question. Personally I don't like the idea that we add yet another hack that'll remain in the codebase forever and slows down all other development efforts in this area (as contributors who want to understand this logic will need to study some extra logic); but if a more experienced reviewer (e.g @Xazax-hun or @martong) accepts this PR, the I won't oppose it. https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
steakhal wrote: > I'm not opposed to landing this PR in llvm-18 (we have it, and it improves > the user experience), but I wouldn't like to have it in llvm-19 and later. To clarify my intentions, I wanted to land this as it resolves user complaints (as fixing an open issue), at a likely reasonable runtime cost. I don't plan to work on the constraint manager for llvm-19 or later. Given this, if we merge this, it will likely remain like that for a long time. If we don't merge this, we have the option for merging it for the next release cycle, but the questions and problems still remain, so why did we postpone this? Personally, I'd say if there are no serious drawbacks for merging this, we should merge it. We would still have plenty of time to revert if we wanted to. Should we merge it? https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
DonatNagyE wrote: Hmm, I would prefer a cleaner, more "theoretical" improvement of the equivalence class handling instead of this "add yet another patch that covers many, but not all cases" approach. I'm not opposed to landing this PR in llvm-18 (we have it, and it improves the user experience), but I wouldn't like to have it in llvm-19 and later. https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
steakhal wrote: Any objections for landing this PR? Or should we postpone this for llvm-19? https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
martong wrote: > it'd be good to "strengthen" the equality classes and ensure that we assign > the same RangeSet to each member of an EQClass (or, in other words, we assign > the RangeSet to an EQClass instead of an individual symbol). RangeSets (i.e. constraints) are assigned to EQClasses, checkout the State's `ConstraintRange`. So this is something that we have since EQClasses have been introduced. Now, you wonder why do we have then this miserable confusion? The problem starts at how we search for a constraint. We build a symbol and then we try to find it's EQClass, and via the class we will get the Range. The crux of the problem lies in building up that symbol in SValBuilder. ``` 1 if (x != y) return; // x == y 2 if (z <= x) return; // z > x 3 if (z >= y) return; // z < y ``` At 3, we build `z < y` which is a completely new symbol, never seen before, we won't find an EQClass for it. Had we built `z < x` , we still would not find the EQClass. This is the reason for checking for reversed comparisons (see `getRangeForComparisonSymbol`). So, an alternative solution could be to patch this search (i.e the RangeInferrer) to be able to find the EQClass. This would have it's own complexity (would you want to search for reversed comparisoins as well for each EQClass member, etc). Also, there is the subtle problem of the eager simplification in SValBuilder: once a symbol is simplified, there is no easy way to build the original non-simplified symbol which has the constraint associated to it. E.g. after having an `assert(z == 1)` in the running example, you cannot query the range anymore for `z < y` , but only for `1 < y`. https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
Xazax-hun wrote: > So to be explicit, no other component would know what canonicalization > happens within the range-based solver. Oh, I see! Thanks for the clarification, this sounds good to me. https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
steakhal wrote: > > So, if I understand you correctly, at the 3rd if statement, we should > > canonicalize the symbol we are constraining by walking every sub-symbol and > > substituting it with its equivalent counterpart (if any), by basically with > > its eqclass' representative symbol. > > I am not 100% sure if I'd go that far (doing canonicalization). Symbols carry > a lot of information, like provenance. Bug reporters, or maybe even some > custom checker state might rely on this. I am afraid that replacing/rewriting > like that might have unexpected consequences, nothing we cannot solve, but I > am not sure whether we want to solve them. What I'd suggest is more like > always adding all the constraints to the representative, and lazily > propagating those constraints to the other members of the equivalence > classes, only when we mention them in a constraint. > > This might also be challenging for bug reporters to explain in some > scenarios, but at least we preserve the provenance information. > > @steakhal It wasn't actually what I had in mind. I wanted to assign constraints to canonicalized symexprs instead of to the symexpr the API gets from the assume call. And for lookups, we would again do the same thing, canonicalize and only then do the lookup - or let the lookup (rather infer do the canonicalization on the fly). So to be explicit, no other component would know what canonicalization happens within the range-based solver. https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
Xazax-hun wrote: > So, if I understand you correctly, at the 3rd if statement, we should > canonicalize the symbol we are constraining by walking every sub-symbol and > substituting it with its equivalent counterpart (if any), by basically with > its eqclass' representative symbol. I am not 100% sure if I'd go that far (doing canonicalization). Symbols carry a lot of information, like provenance. Bug reporters, or maybe even some custom checker state might rely on this. I am afraid that replacing/rewriting like that might have unexpected consequences, nothing we cannot solve, but I am not sure whether we want to solve them. What I'd suggest is more like always adding all the constraints to the representative, and lazily propagating those constraints to the other members of the equivalence classes, only when we mention them in a constraint. This might also be challenging for bug reporters to explain in some scenarios, but at least we preserve the provenance information. @steakhal https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
DonatNagyE wrote: > I think we haven't discussed yet the approach of applying the constraint to > every eqclass member. That would feel like defeating the purpose of eqclasses > at all. I only mentioned it because from a high-level perspective it's equivalent to applying the constraint on the eqclass representative (and acts as a "simplified description" of that approach); I don't think that we should use it as an actual implementation. > To me, our best option so far is to keep the representative symbols alive and > canonicalize (replace) all sub-symbols in constraints with representatives > whenever we merge eqclasses or add constraints. Yes, that seems to be the best way forward. It would probably imply that we introduce an "undead" (=no longer reachable, but kept as a representative) state for symbols which seems to be inelegant, but I think that it won't cause actual problems during the implementation. https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
https://github.com/steakhal updated https://github.com/llvm/llvm-project/pull/71284 >From 92ece501b340c3a2a52b5a4614ddb70bb3e35c93 Mon Sep 17 00:00:00 2001 From: Balazs Benics Date: Sat, 4 Nov 2023 13:44:28 +0100 Subject: [PATCH 1/3] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, then check the eqclasses of X and Y respectively and for `X' RELOP Y'` SymSymExprs and try to infer their ranges. If there is no contradiction with any of the equivalent alternatives, then intersecting all these RangeSets should never be empty - aka. there should be a value satisfying the constraints we have. It costs around `|eqclass(X)| + |eqclass(y)|`. The approach has its limitations, as demonstrated by `gh_62215_contradicting_nested_right_equivalent`, where we would need to apply the same logic, but on a sub-expression of a direct operand. Before the patch, line 90, 100, and 112 would be reachable; and become unreachable after this. Line 127 will remain still reachable, but keep in mind that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would be eliminated. The idea comes from https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474 Fixes #62215 --- .../Core/RangeConstraintManager.cpp | 53 +++ clang/test/Analysis/constraint-assignor.c | 48 + 2 files changed, 101 insertions(+) diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp index 5de99384449a4c8..d631369e895d3a9 100644 --- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp +++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp @@ -2067,6 +2067,12 @@ class ConstraintAssignor : public ConstraintAssignorBase { return Assignor.assign(CoS, NewConstraint); } + /// Check if using an equivalent operand alternative would lead to + /// contradiction. + /// If a contradiction is witnessed, returns false; otherwise returns true. + bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym, + RangeSet Constraint); + /// Handle expressions like: a % b != 0. template bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) { @@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym, return true; } +bool ConstraintAssignor::handleEquivalentAlternativeSymOperands( +const SymSymExpr *SymSym, RangeSet Constraint) { + SymbolRef LHS = SymSym->getLHS(); + SymbolRef RHS = SymSym->getRHS(); + EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS); + EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS); + SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State); + SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State); + llvm::SmallVector AlternativeSymSyms; + + // Gather left alternatives. + for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) { +if (AlternativeLHS == LHS) + continue; +AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS, +SymSym->getType()); + } + + // Gather right alternatives. + for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) { +if (AlternativeRHS == RHS) + continue; +AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS, +SymSym->getType()); + } + + // Crosscheck the inferred ranges. + for (SymSymExpr AltSymSym : AlternativeSymSyms) { +RangeSet AltSymSymConstrant = +SymbolicRangeInferrer::inferRange(RangeFactory, State, ); +Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant); + +// Check if we witnessed a contradiction with the equivalent alternative +// operand. +if (Constraint.isEmpty()) { + State = nullptr; + return false; +} + } + return true; +} + bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym, RangeSet Constraint) { if (!handleRemainderOp(Sym, Constraint)) return false; + if (const auto *SymSym = dyn_cast(Sym); + SymSym && !handleEquivalentAlternativeSymOperands(SymSym, Constraint)) { +return false; + } + std::optional ConstraintAsBool = interpreteAsBool(Constraint); if (!ConstraintAsBool) diff --git a/clang/test/Analysis/constraint-assignor.c b/clang/test/Analysis/constraint-assignor.c index 8210e576c98bd21..d5078b406e12601 100644 --- a/clang/test/Analysis/constraint-assignor.c +++ b/clang/test/Analysis/constraint-assignor.c @@ -82,3 +82,51 @@ void remainder_with_adjustment_of_composit_lhs(int x, int y) { clang_analyzer_eval(x + y != -1);// expected-warning{{TRUE}} (void)(x * y); // keep the constraints alive. } + +void gh_62215(int x, int y, int z) { + if (x != y) return; // x == y + if
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
steakhal wrote: > @DonatNagyE The most straightforward issue that I see is that (if I > understand the code correctly) the intersected constraint (the value of the > variable `Constraint` at the end of > `handleEquivalentAlternativeSymOperands()`) is just discarded after checking > that it's not empty. It's plausible that this discarded information will be > re-calculated by this process in the follow-up range calculations and don't > have a concrete testcase where it's relevant that we "leave food on the > table" (I can try to find one if you'd find it helpful) but I think that it > may be useful to think about "preserving" this `Constraint` that we > calculated. (This is not a blocking issue: I can support merging the current > logic if you don't see a reasonably simple improvement in this area.) I tried adding this hunk at the end of the function, but the assertion never fired: ```c++ // If we learned something, save it. if (Constraint != OriginalConstraint) { assert(false); State = assign(EquivalenceClass::find(State, SymSym), Constraint); return static_cast(State); } ``` > There is also the choice that you limited the runtime to `|eqclass(X)| + > |eqclass(Y)|` by only checking the alternative constraints where one of the > two representatives is the original one. I think that here it's a good > decision to ensure a reasonable runtime by potentially discarding some > information, but I feel that it'd be good to document this decision with a > testcase. For example, I think > > ``` > void gh_62215_left_and_right(int x, int y, int z, int w) { > if (x != y) return; // x == y > if (z != w) return; // z == w > if (z <= x) return; // z > x > if (w >= y) return; // w < y > } > ``` Fair point. Actually, originally I wanted a full cross-product of the alternatives, which I refined to a half-cross-product by following Gábor's comments. We shall come back here and do something more refined if there is a practical need. I added a FIXME comment explaining our options, along with the test case you proposed. Thanks. > Finally there is the "why do we need this trickery at all?" question that was > also raised by @Xazax-hun. I agree with him that on a longer term, it'd be > good to "strengthen" the equality classes and ensure that we assign the same > RangeSet to each member of an EQClass (or, in other words, we assign the > RangeSet to an EQClass instead of an individual symbol). This would > potentially improve the runtime (although I'm not sure that performance is > especially relevant here) and handle the left-and-right testcase adequately. I think we haven't discussed yet the approach of applying the constraint to every eqclass member. That would feel like defeating the purpose of eqclasses at all. To me, our best option so far is to keep the representative symbols alive and canonicalize (replace) all sub-symbols in constraints with representatives whenever we merge eqclasses or add constraints. Thanks for the reviews folks! https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
@@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym, return true; } +bool ConstraintAssignor::handleEquivalentAlternativeSymOperands( +const SymSymExpr *SymSym, RangeSet Constraint) { + SymbolRef LHS = SymSym->getLHS(); + SymbolRef RHS = SymSym->getRHS(); + EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS); + EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS); + SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State); + SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State); + llvm::SmallVector AlternativeSymSyms; + + // Gather left alternatives. + for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) { +if (AlternativeLHS == LHS) + continue; +AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS, +SymSym->getType()); + } + + // Gather right alternatives. + for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) { +if (AlternativeRHS == RHS) + continue; +AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS, +SymSym->getType()); + } + + // Crosscheck the inferred ranges. + for (SymSymExpr AltSymSym : AlternativeSymSyms) { +RangeSet AltSymSymConstrant = DonatNagyE wrote: Nitpick: "constraint" is misspelled in this variable name. https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
https://github.com/DonatNagyE edited https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
https://github.com/DonatNagyE commented: This is a good and important improvement of the analysis results, so I support merging it (with some very minor changes), but I feel that it's a "practical, but incomplete" band-aid instead of a systemic improvement that fits into the architecture. The most straightforward issue that I see is that (if I understand the code correctly) the intersected constraint (the value of the variable `Constraint` at the end of `handleEquivalentAlternativeSymOperands()`) is just discarded after checking that it's not empty. It's plausible that this discarded information will be re-calculated by this process in the follow-up range calculations and don't have a concrete testcase where it's relevant that we "leave food on the table" (I can try to find one if you'd find it helpful) but I think that it may be useful to think about "preserving" this `Constraint` that we calculated. (This is not a blocking issue: I can support merging the current logic if you don't see a reasonably simple improvement in this area.) There is also the choice that you limited the runtime to `|eqclass(X)| + |eqclass(Y)|` by only checking the alternative constraints where one of the two representatives is the original one. I think that here it's a good decision to ensure a reasonable runtime by potentially discarding some information, but I feel that it'd be good to document this decision with a testcase. For example, I think ``` void gh_62215_left_and_right(int x, int y, int z, int w) { if (x != y) return; // x == y if (z != w) return; // z == w if (z <= x) return; // z > x if (w >= y) return; // w < y } ``` would lead to a state that's unreachable, but this fact is not recognized by the current improvement. Finally there is the "why do we need this trickery at all?" question that was also raised by @Xazax-hun. I agree with him that on a longer term, it'd be good to "strengthen" the equality classes and ensure that we assign the same RangeSet to each member of an EQClass (or, in other words, we assign the RangeSet to an EQClass instead of an individual symbol). This would potentially improve the runtime (although I'm not sure that performance is especially relevant here) and handle the left-and-right testcase adequately. However, I don't know how difficult would it be to maintain the invariant that the same RangeSet is assigned to each symbol in an EqClass (we'd need to intersect and update range sets whenever an equality is deduced, handle the situations where this reveals that equality is infeasible and handle any problems arising from the case when the representative becomes dead) -- so I can support merging this working code now instead of waiting indefinitely for that more through solution. (Perhaps consider adding a FIXME at the beginning of `handleEquivalentAlternativeSymOperands()` which mentions this longer-term alternative.) https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
steakhal wrote: > I think every time we need to iterate over all member of an equivalence > class, we might do something wrong. The point of the equivalence class would > be to make sure those elements are equivalent. One way to avoid iteration > would be to always use the representative of the equivalence class. E.g., > each time we record a new constraint to a member of the class, we add this > information to the representative element. Every time we do a query, we first > look up the representative element which already should have all the info > from the class and use it instead of the original symbol. > > Would something like this work or am I missing something? I had to think about it for a little while, and here are my thoughts: In an example like here: ```c++ void gh_62215(int x, int y, int z) { if (x != y) return; // x == y if (z <= x) return; // z > x if (z >= y) return; // z < y clang_analyzer_warnIfReached(); // no-warning: This should be dead code. (void)(x + y + z); // keep the constraints alive. } ``` Here, after the first if, we would have an eqclass of `{x,y}`; with a single constraint `x != y: [0,0]`. After the second if, we would have notionally 3 eqclasses: `{x,y}`, and two trivial ones `x` and `y`. We would also have 3 constraints: `x != y: [0,0]`, `z <= x: [0,0]`, `z <= y: [0,0]`. Note that the keys in the constraints map (symbols) can be 'arbitrarily' complex and refer to already dead symbols. In this example, I keep these symbols artificially alive to make the case slightly simpler to discuss. So, if I understand you correctly, at the 3rd if statement, we should canonicalize the symbol we are constraining by walking every sub-symbol and substituting it with its equivalent counterpart (if any), by basically with its eqclass' representative symbol. In this example, it would mean that instead of adding a constraint `z >= y: [0,0]`, we would instead `z >= x: [0,0]` (assuming that `x` is the representative symbol of eqclass `{x,y}`. I believe this canonicalization could work (and would solve the other demonstrative limitation I added in this PR), but would also incur overhead. And from a design point of view, an eqclass would need to keep the representative symbol alive, because otherwise, we can't substitute a symbol with the representative symbol. (Note that for this reason, we don't actually have a representative symbol, but rather a unique integral value as an ID for the eqclass - which happens to be equal to the `SymbolRef` pointer value of the representative symbol if that's still alive. Here is an example where the representative symbol `x` of the eqclass `{x,y}` would die, but remain the ID of the eqclass: ```c++ // same example, but with: (void)(y + z); // X is not mentioned!, thus reclaimed after the 2nd if statement. ``` https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
Xazax-hun wrote: I think every time we need to iterate over all member of an equivalence class, we might do something wrong. The point of the equivalence class would be to make sure those elements are equivalent. One way to avoid iteration would be to always use the representative of the equivalence class. E.g., each time we record a new constraint to a member of the class, we add this information to the representative element. Every time we do a query, we first look up the representative element which already should have all the info from the class and use it instead of the original symbol. Would something like this work or am I missing something? https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
steakhal wrote: For crossreference: I raised some related questions around having void casts artificially keeping constraints and symbols alive at discuss: https://discourse.llvm.org/t/range-based-solver-and-eager-symbol-garbage-collection/74670 https://github.com/llvm/llvm-project/pull/71284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
llvmbot wrote: @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) Changes The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, then check the eqclasses of X and Y respectively and for `X' RELOP Y'` SymSymExprs and try to infer their ranges. If there is no contradiction with any of the equivalent alternatives, then intersecting all these RangeSets should never be empty - aka. there should be a value satisfying the constraints we have. It costs around `|eqclass(X)| + |eqclass(y)|`. The approach has its limitations, as demonstrated by `gh_62215_contradicting_nested_right_equivalent`, where we would need to apply the same logic, but on a sub-expression of a direct operand. Before the patch, line 90, 100, and 112 would be reachable; and become unreachable after this. Line 127 will remain still reachable, but keep in mind that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would be eliminated. The idea comes from https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474 Fixes #62215 --- Full diff: https://github.com/llvm/llvm-project/pull/71284.diff 2 Files Affected: - (modified) clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp (+53) - (modified) clang/test/Analysis/constraint-assignor.c (+48) ``diff diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp index 5de99384449a4c8..d631369e895d3a9 100644 --- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp +++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp @@ -2067,6 +2067,12 @@ class ConstraintAssignor : public ConstraintAssignorBase { return Assignor.assign(CoS, NewConstraint); } + /// Check if using an equivalent operand alternative would lead to + /// contradiction. + /// If a contradiction is witnessed, returns false; otherwise returns true. + bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym, + RangeSet Constraint); + /// Handle expressions like: a % b != 0. template bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) { @@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym, return true; } +bool ConstraintAssignor::handleEquivalentAlternativeSymOperands( +const SymSymExpr *SymSym, RangeSet Constraint) { + SymbolRef LHS = SymSym->getLHS(); + SymbolRef RHS = SymSym->getRHS(); + EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS); + EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS); + SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State); + SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State); + llvm::SmallVector AlternativeSymSyms; + + // Gather left alternatives. + for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) { +if (AlternativeLHS == LHS) + continue; +AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS, +SymSym->getType()); + } + + // Gather right alternatives. + for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) { +if (AlternativeRHS == RHS) + continue; +AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS, +SymSym->getType()); + } + + // Crosscheck the inferred ranges. + for (SymSymExpr AltSymSym : AlternativeSymSyms) { +RangeSet AltSymSymConstrant = +SymbolicRangeInferrer::inferRange(RangeFactory, State, ); +Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant); + +// Check if we witnessed a contradiction with the equivalent alternative +// operand. +if (Constraint.isEmpty()) { + State = nullptr; + return false; +} + } + return true; +} + bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym, RangeSet Constraint) { if (!handleRemainderOp(Sym, Constraint)) return false; + if (const auto *SymSym = dyn_cast(Sym); + SymSym && !handleEquivalentAlternativeSymOperands(SymSym, Constraint)) { +return false; + } + std::optional ConstraintAsBool = interpreteAsBool(Constraint); if (!ConstraintAsBool) diff --git a/clang/test/Analysis/constraint-assignor.c b/clang/test/Analysis/constraint-assignor.c index 8210e576c98bd21..d5078b406e12601 100644 --- a/clang/test/Analysis/constraint-assignor.c +++ b/clang/test/Analysis/constraint-assignor.c @@ -82,3 +82,51 @@ void remainder_with_adjustment_of_composit_lhs(int x, int y) { clang_analyzer_eval(x + y != -1);// expected-warning{{TRUE}} (void)(x * y); // keep the constraints alive. } + +void gh_62215(int x, int y, int z) { + if (x != y) return; // x == y + if (z <= x) return; // z > x + if (z >= y) return; // z < y + clang_analyzer_warnIfReached(); // no-warning: This should be dead code. +
[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)
https://github.com/steakhal created https://github.com/llvm/llvm-project/pull/71284 The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, then check the eqclasses of X and Y respectively and for `X' RELOP Y'` SymSymExprs and try to infer their ranges. If there is no contradiction with any of the equivalent alternatives, then intersecting all these RangeSets should never be empty - aka. there should be a value satisfying the constraints we have. It costs around `|eqclass(X)| + |eqclass(y)|`. The approach has its limitations, as demonstrated by `gh_62215_contradicting_nested_right_equivalent`, where we would need to apply the same logic, but on a sub-expression of a direct operand. Before the patch, line 90, 100, and 112 would be reachable; and become unreachable after this. Line 127 will remain still reachable, but keep in mind that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would be eliminated. The idea comes from https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474 Fixes #62215 >From 92ece501b340c3a2a52b5a4614ddb70bb3e35c93 Mon Sep 17 00:00:00 2001 From: Balazs Benics Date: Sat, 4 Nov 2023 13:44:28 +0100 Subject: [PATCH] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, then check the eqclasses of X and Y respectively and for `X' RELOP Y'` SymSymExprs and try to infer their ranges. If there is no contradiction with any of the equivalent alternatives, then intersecting all these RangeSets should never be empty - aka. there should be a value satisfying the constraints we have. It costs around `|eqclass(X)| + |eqclass(y)|`. The approach has its limitations, as demonstrated by `gh_62215_contradicting_nested_right_equivalent`, where we would need to apply the same logic, but on a sub-expression of a direct operand. Before the patch, line 90, 100, and 112 would be reachable; and become unreachable after this. Line 127 will remain still reachable, but keep in mind that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would be eliminated. The idea comes from https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474 Fixes #62215 --- .../Core/RangeConstraintManager.cpp | 53 +++ clang/test/Analysis/constraint-assignor.c | 48 + 2 files changed, 101 insertions(+) diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp index 5de99384449a4c8..d631369e895d3a9 100644 --- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp +++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp @@ -2067,6 +2067,12 @@ class ConstraintAssignor : public ConstraintAssignorBase { return Assignor.assign(CoS, NewConstraint); } + /// Check if using an equivalent operand alternative would lead to + /// contradiction. + /// If a contradiction is witnessed, returns false; otherwise returns true. + bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym, + RangeSet Constraint); + /// Handle expressions like: a % b != 0. template bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) { @@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym, return true; } +bool ConstraintAssignor::handleEquivalentAlternativeSymOperands( +const SymSymExpr *SymSym, RangeSet Constraint) { + SymbolRef LHS = SymSym->getLHS(); + SymbolRef RHS = SymSym->getRHS(); + EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS); + EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS); + SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State); + SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State); + llvm::SmallVector AlternativeSymSyms; + + // Gather left alternatives. + for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) { +if (AlternativeLHS == LHS) + continue; +AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS, +SymSym->getType()); + } + + // Gather right alternatives. + for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) { +if (AlternativeRHS == RHS) + continue; +AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS, +SymSym->getType()); + } + + // Crosscheck the inferred ranges. + for (SymSymExpr AltSymSym : AlternativeSymSyms) { +RangeSet AltSymSymConstrant = +SymbolicRangeInferrer::inferRange(RangeFactory, State, ); +Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant); + +// Check if we witnessed a contradiction with the equivalent alternative +// operand. +if (Constraint.isEmpty()) { + State = nullptr; + return false; +} + } + return