[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-12-28 Thread via cfe-commits

DonatNagyE wrote:

I'd like to abstain from deciding this question.

Personally I don't like the idea that we add yet another hack that'll remain in 
the codebase forever and slows down all other development efforts in this area 
(as contributors who want to understand this logic will need to study some 
extra logic); but if a more experienced reviewer (e.g @Xazax-hun or @martong) 
accepts this PR, the I won't oppose it. 

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-12-28 Thread Balazs Benics via cfe-commits

steakhal wrote:

> I'm not opposed to landing this PR in llvm-18 (we have it, and it improves 
> the user experience), but I wouldn't like to have it in llvm-19 and later.

To clarify my intentions, I wanted to land this as it resolves user complaints 
(as fixing an open issue), at a likely reasonable runtime cost. I don't plan to 
work on the constraint manager for llvm-19 or later.
Given this, if we merge this, it will likely remain like that for a long time. 
If we don't merge this, we have the option for merging it for the next release 
cycle, but the questions and problems still remain, so why did we postpone this?

Personally, I'd say if there are no serious drawbacks for merging this, we 
should merge it.
We would still have plenty of time to revert if we wanted to.

Should we merge it?

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-12-28 Thread via cfe-commits

DonatNagyE wrote:

Hmm, I would prefer a cleaner, more "theoretical" improvement of the 
equivalence class handling instead of this "add yet another patch that covers 
many, but not all cases" approach.

I'm not opposed to landing this PR in llvm-18 (we have it, and it improves the 
user experience), but I wouldn't like to have it in llvm-19 and later.

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-12-28 Thread Balazs Benics via cfe-commits

steakhal wrote:

Any objections for landing this PR? Or should we postpone this for llvm-19?

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-07 Thread Gabor Marton via cfe-commits

martong wrote:

> it'd be good to "strengthen" the equality classes and ensure that we assign 
> the same RangeSet to each member of an EQClass (or, in other words, we assign 
> the RangeSet to an EQClass instead of an individual symbol).

RangeSets (i.e. constraints) are assigned to EQClasses, checkout the State's 
`ConstraintRange`. So this is something that we have since EQClasses have been 
introduced. Now, you wonder why do we have then this miserable confusion? The 
problem starts at how we search for a constraint. We build a symbol and then we 
try to find it's EQClass, and via the class we will get the Range. The crux of 
the problem lies in building up that symbol in SValBuilder.
```
1  if (x != y) return; // x == y
2  if (z <= x) return; // z > x
3  if (z >= y) return; // z < y
```
At 3, we build `z < y` which is a completely new symbol, never seen before, we 
won't find an EQClass for it. Had we built `z < x` , we still would not find 
the EQClass. This is the reason for checking for reversed comparisons (see 
`getRangeForComparisonSymbol`). So, an alternative solution could be to patch 
this search (i.e the RangeInferrer) to be able to find the EQClass. This would 
have it's own complexity (would you want to search for reversed comparisoins as 
well for each EQClass member, etc). Also, there is the subtle problem of the 
eager simplification in SValBuilder: once a symbol is simplified, there is no 
easy way to build the original non-simplified symbol which has the constraint 
associated to it. E.g. after having an `assert(z == 1)` in the running example, 
you cannot query the range anymore for `z < y` , but only for `1 < y`.

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Gábor Horváth via cfe-commits

Xazax-hun wrote:

> So to be explicit, no other component would know what canonicalization 
> happens within the range-based solver.

Oh, I see! Thanks for the clarification, this sounds good to me.

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

steakhal wrote:

> > So, if I understand you correctly, at the 3rd if statement, we should 
> > canonicalize the symbol we are constraining by walking every sub-symbol and 
> > substituting it with its equivalent counterpart (if any), by basically with 
> > its eqclass' representative symbol.
> 
> I am not 100% sure if I'd go that far (doing canonicalization). Symbols carry 
> a lot of information, like provenance. Bug reporters, or maybe even some 
> custom checker state might rely on this. I am afraid that replacing/rewriting 
> like that might have unexpected consequences, nothing we cannot solve, but I 
> am not sure whether we want to solve them. What I'd suggest is more like 
> always adding all the constraints to the representative, and lazily 
> propagating those constraints to the other members of the equivalence 
> classes, only when we mention them in a constraint.
> 
> This might also be challenging for bug reporters to explain in some 
> scenarios, but at least we preserve the provenance information.
> 
> @steakhal

It wasn't actually what I had in mind. I wanted to assign constraints to 
canonicalized symexprs instead of to the symexpr the API gets from the assume 
call. And for lookups, we would again do the same thing, canonicalize and only 
then do the lookup - or let the lookup (rather infer do the canonicalization on 
the fly). So to be explicit, no other component would know what 
canonicalization happens within the range-based solver.

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Gábor Horváth via cfe-commits

Xazax-hun wrote:

> So, if I understand you correctly, at the 3rd if statement, we should 
> canonicalize the symbol we are constraining by walking every sub-symbol and 
> substituting it with its equivalent counterpart (if any), by basically with 
> its eqclass' representative symbol.

I am not 100% sure if I'd go that far (doing canonicalization). Symbols carry a 
lot of information, like provenance. Bug reporters, or maybe even some custom 
checker state might rely on this. I am afraid that replacing/rewriting like 
that might have unexpected consequences, nothing we cannot solve, but I am not 
sure whether we want to solve them. What I'd suggest is more like always adding 
all the constraints to the representative, and lazily propagating those 
constraints to the other members of the equivalence classes, only when we 
mention them in a constraint.

This might also be challenging for bug reporters to explain in some scenarios, 
but at least we preserve the provenance information. 

@steakhal

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread via cfe-commits

DonatNagyE wrote:

> I think we haven't discussed yet the approach of applying the constraint to 
> every eqclass member. That would feel like defeating the purpose of eqclasses 
> at all.

I only mentioned it because from a high-level perspective it's equivalent to 
applying the constraint on the eqclass representative (and acts as a 
"simplified description" of that approach); I don't think that we should use it 
as an actual implementation.

> To me, our best option so far is to keep the representative symbols alive and 
> canonicalize (replace) all sub-symbols in constraints with representatives 
> whenever we merge eqclasses or add constraints.

Yes, that seems to be the best way forward. It would probably imply that we 
introduce an "undead" (=no longer reachable, but kept as a representative) 
state for symbols which seems to be inelegant, but I think that it won't cause 
actual problems during the implementation.

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/71284

>From 92ece501b340c3a2a52b5a4614ddb70bb3e35c93 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Sat, 4 Nov 2023 13:44:28 +0100
Subject: [PATCH 1/3] [analyzer][solver] On SymSym RelOps, check EQClass
 members for contradictions

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`,
then check the eqclasses of X and Y respectively and for `X' RELOP Y'`
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives,
then intersecting all these RangeSets should never be empty - aka. there
should be a value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by
`gh_62215_contradicting_nested_right_equivalent`, where we would need to
apply the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become
unreachable after this. Line 127 will remain still reachable, but keep
in mind that when cross-checking with Z3 (aka. Z3 refutation), then all
4 reports would be eliminated.

The idea comes from
https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474

Fixes #62215
---
 .../Core/RangeConstraintManager.cpp   | 53 +++
 clang/test/Analysis/constraint-assignor.c | 48 +
 2 files changed, 101 insertions(+)

diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp 
b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
index 5de99384449a4c8..d631369e895d3a9 100644
--- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2067,6 +2067,12 @@ class ConstraintAssignor : public 
ConstraintAssignorBase {
 return Assignor.assign(CoS, NewConstraint);
   }
 
+  /// Check if using an equivalent operand alternative would lead to
+  /// contradiction.
+  /// If a contradiction is witnessed, returns false; otherwise returns true.
+  bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym,
+  RangeSet Constraint);
+
   /// Handle expressions like: a % b != 0.
   template 
   bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) {
@@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const 
SymExpr *Sym,
   return true;
 }
 
+bool ConstraintAssignor::handleEquivalentAlternativeSymOperands(
+const SymSymExpr *SymSym, RangeSet Constraint) {
+  SymbolRef LHS = SymSym->getLHS();
+  SymbolRef RHS = SymSym->getRHS();
+  EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS);
+  EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS);
+  SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State);
+  SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State);
+  llvm::SmallVector AlternativeSymSyms;
+
+  // Gather left alternatives.
+  for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) {
+if (AlternativeLHS == LHS)
+  continue;
+AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS,
+SymSym->getType());
+  }
+
+  // Gather right alternatives.
+  for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) {
+if (AlternativeRHS == RHS)
+  continue;
+AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS,
+SymSym->getType());
+  }
+
+  // Crosscheck the inferred ranges.
+  for (SymSymExpr AltSymSym : AlternativeSymSyms) {
+RangeSet AltSymSymConstrant =
+SymbolicRangeInferrer::inferRange(RangeFactory, State, );
+Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant);
+
+// Check if we witnessed a contradiction with the equivalent alternative
+// operand.
+if (Constraint.isEmpty()) {
+  State = nullptr;
+  return false;
+}
+  }
+  return true;
+}
+
 bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym,
 RangeSet Constraint) {
   if (!handleRemainderOp(Sym, Constraint))
 return false;
 
+  if (const auto *SymSym = dyn_cast(Sym);
+  SymSym && !handleEquivalentAlternativeSymOperands(SymSym, Constraint)) {
+return false;
+  }
+
   std::optional ConstraintAsBool = interpreteAsBool(Constraint);
 
   if (!ConstraintAsBool)
diff --git a/clang/test/Analysis/constraint-assignor.c 
b/clang/test/Analysis/constraint-assignor.c
index 8210e576c98bd21..d5078b406e12601 100644
--- a/clang/test/Analysis/constraint-assignor.c
+++ b/clang/test/Analysis/constraint-assignor.c
@@ -82,3 +82,51 @@ void remainder_with_adjustment_of_composit_lhs(int x, int y) 
{
   clang_analyzer_eval(x + y != -1);// expected-warning{{TRUE}}
   (void)(x * y); // keep the constraints alive.
 }
+
+void gh_62215(int x, int y, int z) {
+  if (x != y) return; // x == y
+  if 

[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

steakhal wrote:

> @DonatNagyE  The most straightforward issue that I see is that (if I 
> understand the code correctly) the intersected constraint (the value of the 
> variable `Constraint` at the end of 
> `handleEquivalentAlternativeSymOperands()`) is just discarded after checking 
> that it's not empty. It's plausible that this discarded information will be 
> re-calculated by this process in the follow-up range calculations and don't 
> have a concrete testcase where it's relevant that we "leave food on the 
> table" (I can try to find one if you'd find it helpful) but I think that it 
> may be useful to think about "preserving" this `Constraint` that we 
> calculated. (This is not a blocking issue: I can support merging the current 
> logic if you don't see a reasonably simple improvement in this area.)

I tried adding this hunk at the end of the function, but the assertion never 
fired:
```c++
  // If we learned something, save it.
  if (Constraint != OriginalConstraint) {
assert(false);
State = assign(EquivalenceClass::find(State, SymSym), Constraint);
return static_cast(State);
  }
```

> There is also the choice that you limited the runtime to `|eqclass(X)| + 
> |eqclass(Y)|` by only checking the alternative constraints where one of the 
> two representatives is the original one. I think that here it's a good 
> decision to ensure a reasonable runtime by potentially discarding some 
> information, but I feel that it'd be good to document this decision with a 
> testcase. For example, I think
> 
> ```
> void gh_62215_left_and_right(int x, int y, int z, int w) {
>   if (x != y) return; // x == y
>   if (z != w) return; // z == w
>   if (z <= x) return; // z > x
>   if (w >= y) return; // w < y
> }
> ```

Fair point. Actually, originally I wanted a full cross-product of the 
alternatives, which I refined to a half-cross-product by following Gábor's 
comments. We shall come back here and do something more refined if there is a 
practical need.
I added a FIXME comment explaining our options, along with the test case you 
proposed. Thanks.

> Finally there is the "why do we need this trickery at all?" question that was 
> also raised by @Xazax-hun. I agree with him that on a longer term, it'd be 
> good to "strengthen" the equality classes and ensure that we assign the same 
> RangeSet to each member of an EQClass (or, in other words, we assign the 
> RangeSet to an EQClass instead of an individual symbol). This would 
> potentially improve the runtime (although I'm not sure that performance is 
> especially relevant here) and handle the left-and-right testcase adequately.

I think we haven't discussed yet the approach of applying the constraint to 
every eqclass member. That would feel like defeating the purpose of eqclasses 
at all.
To me, our best option so far is to keep the representative symbols alive and 
canonicalize (replace) all sub-symbols in constraints with representatives 
whenever we merge eqclasses or add constraints.

Thanks for the reviews folks!

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread via cfe-commits


@@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const 
SymExpr *Sym,
   return true;
 }
 
+bool ConstraintAssignor::handleEquivalentAlternativeSymOperands(
+const SymSymExpr *SymSym, RangeSet Constraint) {
+  SymbolRef LHS = SymSym->getLHS();
+  SymbolRef RHS = SymSym->getRHS();
+  EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS);
+  EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS);
+  SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State);
+  SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State);
+  llvm::SmallVector AlternativeSymSyms;
+
+  // Gather left alternatives.
+  for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) {
+if (AlternativeLHS == LHS)
+  continue;
+AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS,
+SymSym->getType());
+  }
+
+  // Gather right alternatives.
+  for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) {
+if (AlternativeRHS == RHS)
+  continue;
+AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS,
+SymSym->getType());
+  }
+
+  // Crosscheck the inferred ranges.
+  for (SymSymExpr AltSymSym : AlternativeSymSyms) {
+RangeSet AltSymSymConstrant =

DonatNagyE wrote:

Nitpick: "constraint" is misspelled in this variable name.

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread via cfe-commits

https://github.com/DonatNagyE edited 
https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread via cfe-commits

https://github.com/DonatNagyE commented:

This is a good and important improvement of the analysis results, so I support 
merging it (with some very minor changes), but I feel that it's a "practical, 
but incomplete" band-aid instead of a systemic improvement that fits into the 
architecture.

The most straightforward issue that I see is that (if I understand the code 
correctly) the intersected constraint (the value of the variable `Constraint` 
at the end of `handleEquivalentAlternativeSymOperands()`) is just discarded 
after checking that it's not empty. It's plausible that this discarded 
information will be re-calculated by this process in the follow-up range 
calculations and don't have a concrete testcase where it's relevant that we 
"leave food on the table" (I can try to find one if you'd find it helpful) but 
I think that it may be useful to think about "preserving" this `Constraint` 
that we calculated. (This is not a blocking issue: I can support merging the 
current logic if you don't see a reasonably simple improvement in this area.)

There is also the choice that you limited the runtime to `|eqclass(X)| + 
|eqclass(Y)|` by only checking the alternative constraints where one of the two 
representatives is the original one. I think that here it's a good decision to 
ensure a reasonable runtime by potentially discarding some information, but I 
feel that it'd be good to document this decision with a testcase. For example, 
I think
```
void gh_62215_left_and_right(int x, int y, int z, int w) {
  if (x != y) return; // x == y
  if (z != w) return; // z == w
  if (z <= x) return; // z > x
  if (w >= y) return; // w < y
}
```
would lead to a state that's unreachable, but this fact is not recognized by 
the current improvement.

Finally there is the "why do we need this trickery at all?" question that was 
also raised by @Xazax-hun. I agree with him that on a longer term, it'd be good 
to "strengthen" the equality classes and ensure that we assign the same 
RangeSet to each member of an EQClass (or, in other words, we assign the 
RangeSet to an EQClass instead of an individual symbol). This would potentially 
improve the runtime (although I'm not sure that performance is especially 
relevant here) and handle the left-and-right testcase adequately.

However, I don't know how difficult would it be to maintain the invariant that 
the same RangeSet is assigned to each symbol in an EqClass (we'd need to 
intersect and update range sets whenever an equality is deduced, handle the 
situations where this reveals that equality is infeasible and handle any 
problems arising from the case when the representative becomes dead) -- so I 
can support merging this working code now instead of waiting indefinitely for 
that more through solution. (Perhaps consider adding a FIXME at the beginning 
of `handleEquivalentAlternativeSymOperands()` which mentions this longer-term 
alternative.)

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

steakhal wrote:

> I think every time we need to iterate over all member of an equivalence 
> class, we might do something wrong. The point of the equivalence class would 
> be to make sure those elements are equivalent. One way to avoid iteration 
> would be to always use the representative of the equivalence class. E.g., 
> each time we record a new constraint to a member of the class, we add this 
> information to the representative element. Every time we do a query, we first 
> look up the representative element which already should have all the info 
> from the class and use it instead of the original symbol.
> 
> Would something like this work or am I missing something?

I had to think about it for a little while, and here are my thoughts:
In an example like here:
```c++
void gh_62215(int x, int y, int z) {
  if (x != y) return; // x == y
  if (z <= x) return; // z > x
  if (z >= y) return; // z < y
  clang_analyzer_warnIfReached(); // no-warning: This should be dead code.
  (void)(x + y + z); // keep the constraints alive.
}
```
Here, after the first if, we would have an eqclass of `{x,y}`; with a single 
constraint `x != y: [0,0]`.
After the second if, we would have notionally 3 eqclasses: `{x,y}`, and two 
trivial ones `x` and `y`.
We would also have 3 constraints: `x != y: [0,0]`, `z <= x: [0,0]`, `z <= y: 
[0,0]`.

Note that the keys in the constraints map (symbols) can be 'arbitrarily' 
complex and refer to already dead symbols.
In this example, I keep these symbols artificially alive to make the case 
slightly simpler to discuss.

So, if I understand you correctly, at the 3rd if statement, we should 
canonicalize the symbol we are constraining by walking every sub-symbol and 
substituting it with its equivalent counterpart (if any), by basically with its 
eqclass' representative symbol.
In this example, it would mean that instead of adding a constraint `z >= y: 
[0,0]`, we would instead `z >= x: [0,0]` (assuming that `x` is the 
representative symbol of eqclass `{x,y}`.
I believe this canonicalization could work (and would solve the other 
demonstrative limitation I added in this PR), but would also incur overhead.
And from a design point of view, an eqclass would need to keep the 
representative symbol alive, because otherwise, we can't substitute a symbol 
with the representative symbol.
(Note that for this reason, we don't actually have a representative symbol, but 
rather a unique integral value as an ID for the eqclass - which happens to be 
equal to the `SymbolRef` pointer value of the representative symbol if that's 
still alive.

Here is an example where the representative symbol `x` of the eqclass `{x,y}` 
would die, but remain the ID of the eqclass:
```c++
  // same example, but with:
  (void)(y + z); // X is not mentioned!, thus reclaimed after the 2nd if 
statement.
```

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-05 Thread Gábor Horváth via cfe-commits

Xazax-hun wrote:

I think every time we need to iterate over all member of an equivalence class, 
we might do something wrong. The point of the equivalence class would be to 
make sure those elements are equivalent. One way to avoid iteration would be to 
always use the representative of the equivalence class. E.g., each time we 
record a new constraint to a member of the class, we add this information to 
the representative element. Every time we do a query, we first look up the 
representative element which already should have all the info from the class 
and use it instead of the original symbol.

Would something like this work or am I missing something?

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-04 Thread Balazs Benics via cfe-commits

steakhal wrote:

For crossreference: I raised some related questions around having void casts 
artificially keeping constraints and symbols alive at discuss:
https://discourse.llvm.org/t/range-based-solver-and-eager-symbol-garbage-collection/74670

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-04 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)


Changes

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, 
then check the eqclasses of X and Y respectively and for `X' RELOP Y'` 
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives, then 
intersecting all these RangeSets should never be empty - aka. there should be a 
value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by 
`gh_62215_contradicting_nested_right_equivalent`, where we would need to apply 
the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become 
unreachable after this. Line 127 will remain still reachable, but keep in mind 
that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would 
be eliminated.

The idea comes from
https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474

Fixes #62215

---
Full diff: https://github.com/llvm/llvm-project/pull/71284.diff


2 Files Affected:

- (modified) clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp (+53) 
- (modified) clang/test/Analysis/constraint-assignor.c (+48) 


``diff
diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp 
b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
index 5de99384449a4c8..d631369e895d3a9 100644
--- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2067,6 +2067,12 @@ class ConstraintAssignor : public 
ConstraintAssignorBase {
 return Assignor.assign(CoS, NewConstraint);
   }
 
+  /// Check if using an equivalent operand alternative would lead to
+  /// contradiction.
+  /// If a contradiction is witnessed, returns false; otherwise returns true.
+  bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym,
+  RangeSet Constraint);
+
   /// Handle expressions like: a % b != 0.
   template 
   bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) {
@@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const 
SymExpr *Sym,
   return true;
 }
 
+bool ConstraintAssignor::handleEquivalentAlternativeSymOperands(
+const SymSymExpr *SymSym, RangeSet Constraint) {
+  SymbolRef LHS = SymSym->getLHS();
+  SymbolRef RHS = SymSym->getRHS();
+  EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS);
+  EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS);
+  SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State);
+  SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State);
+  llvm::SmallVector AlternativeSymSyms;
+
+  // Gather left alternatives.
+  for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) {
+if (AlternativeLHS == LHS)
+  continue;
+AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS,
+SymSym->getType());
+  }
+
+  // Gather right alternatives.
+  for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) {
+if (AlternativeRHS == RHS)
+  continue;
+AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS,
+SymSym->getType());
+  }
+
+  // Crosscheck the inferred ranges.
+  for (SymSymExpr AltSymSym : AlternativeSymSyms) {
+RangeSet AltSymSymConstrant =
+SymbolicRangeInferrer::inferRange(RangeFactory, State, );
+Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant);
+
+// Check if we witnessed a contradiction with the equivalent alternative
+// operand.
+if (Constraint.isEmpty()) {
+  State = nullptr;
+  return false;
+}
+  }
+  return true;
+}
+
 bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym,
 RangeSet Constraint) {
   if (!handleRemainderOp(Sym, Constraint))
 return false;
 
+  if (const auto *SymSym = dyn_cast(Sym);
+  SymSym && !handleEquivalentAlternativeSymOperands(SymSym, Constraint)) {
+return false;
+  }
+
   std::optional ConstraintAsBool = interpreteAsBool(Constraint);
 
   if (!ConstraintAsBool)
diff --git a/clang/test/Analysis/constraint-assignor.c 
b/clang/test/Analysis/constraint-assignor.c
index 8210e576c98bd21..d5078b406e12601 100644
--- a/clang/test/Analysis/constraint-assignor.c
+++ b/clang/test/Analysis/constraint-assignor.c
@@ -82,3 +82,51 @@ void remainder_with_adjustment_of_composit_lhs(int x, int y) 
{
   clang_analyzer_eval(x + y != -1);// expected-warning{{TRUE}}
   (void)(x * y); // keep the constraints alive.
 }
+
+void gh_62215(int x, int y, int z) {
+  if (x != y) return; // x == y
+  if (z <= x) return; // z > x
+  if (z >= y) return; // z < y
+  clang_analyzer_warnIfReached(); // no-warning: This should be dead code.
+  

[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/71284

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, 
then check the eqclasses of X and Y respectively and for `X' RELOP Y'` 
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives, then 
intersecting all these RangeSets should never be empty - aka. there should be a 
value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by 
`gh_62215_contradicting_nested_right_equivalent`, where we would need to apply 
the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become 
unreachable after this. Line 127 will remain still reachable, but keep in mind 
that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would 
be eliminated.

The idea comes from
https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474

Fixes #62215

>From 92ece501b340c3a2a52b5a4614ddb70bb3e35c93 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Sat, 4 Nov 2023 13:44:28 +0100
Subject: [PATCH] [analyzer][solver] On SymSym RelOps, check EQClass members
 for contradictions

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`,
then check the eqclasses of X and Y respectively and for `X' RELOP Y'`
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives,
then intersecting all these RangeSets should never be empty - aka. there
should be a value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by
`gh_62215_contradicting_nested_right_equivalent`, where we would need to
apply the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become
unreachable after this. Line 127 will remain still reachable, but keep
in mind that when cross-checking with Z3 (aka. Z3 refutation), then all
4 reports would be eliminated.

The idea comes from
https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474

Fixes #62215
---
 .../Core/RangeConstraintManager.cpp   | 53 +++
 clang/test/Analysis/constraint-assignor.c | 48 +
 2 files changed, 101 insertions(+)

diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp 
b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
index 5de99384449a4c8..d631369e895d3a9 100644
--- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2067,6 +2067,12 @@ class ConstraintAssignor : public 
ConstraintAssignorBase {
 return Assignor.assign(CoS, NewConstraint);
   }
 
+  /// Check if using an equivalent operand alternative would lead to
+  /// contradiction.
+  /// If a contradiction is witnessed, returns false; otherwise returns true.
+  bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym,
+  RangeSet Constraint);
+
   /// Handle expressions like: a % b != 0.
   template 
   bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) {
@@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const 
SymExpr *Sym,
   return true;
 }
 
+bool ConstraintAssignor::handleEquivalentAlternativeSymOperands(
+const SymSymExpr *SymSym, RangeSet Constraint) {
+  SymbolRef LHS = SymSym->getLHS();
+  SymbolRef RHS = SymSym->getRHS();
+  EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS);
+  EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS);
+  SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State);
+  SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State);
+  llvm::SmallVector AlternativeSymSyms;
+
+  // Gather left alternatives.
+  for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) {
+if (AlternativeLHS == LHS)
+  continue;
+AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS,
+SymSym->getType());
+  }
+
+  // Gather right alternatives.
+  for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) {
+if (AlternativeRHS == RHS)
+  continue;
+AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS,
+SymSym->getType());
+  }
+
+  // Crosscheck the inferred ranges.
+  for (SymSymExpr AltSymSym : AlternativeSymSyms) {
+RangeSet AltSymSymConstrant =
+SymbolicRangeInferrer::inferRange(RangeFactory, State, );
+Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant);
+
+// Check if we witnessed a contradiction with the equivalent alternative
+// operand.
+if (Constraint.isEmpty()) {
+  State = nullptr;
+  return false;
+}
+  }
+  return