martong added a subscriber: vabridgers.
martong added a comment.

First of all, thanks Denys for working on this, nice work! Here are my concerns 
and remarks.

I think this fixed set of types in `NominalTypeList` is too rigid. I don't like 
the fact that we have to iterate over all four types whenever we set a new 
constraint or when we try to infer. Also, I am thinking about downstream 
hardware architectures, where there might be integers with different bit-widths 
(@vabridgers). Also, at some point people will pursue us to support integers 
with arbitrary bitwidth (see _ExtInt 
<https://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html>)

Thus, I am proposing an alternative approach. We should have a SymExpr -> Set 
of SymExpr mapping in the State that represents the relation of symbols that 
are connected via some cast operations (see REGISTER_MAP_WITH_PROGRAMSTATE). 
Let's call this mapping as `CastMap`. The key should be the root symbol, i.e 
the symbol that is being declared first before all cast operations.

E.g.  Let's have

  int16 a = 128;

then we have a constraint [128,128] stored for `$a`. Then

  if ((int8)a < 0)

creates a new symbol `$a2` (SymbolCast) that has a new constraint [-128,-128] 
assigned to it. And we also keep track in the State, that `$a` and `$a2` refers 
the same root symbol (`a`). We now have in the CastMap $a -> [$a2].

Now, let's say we have

  if ((_ExtInt(7))a > 64)

then we can dig up the existing contraints from CastMap to check for the 
State's validity and we can update all the constraints of $a and $a2 as needed. 
Also, CastMap is updated: $a -> [$a2, $a3].



================
Comment at: clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp:421-426
+  // Unwrap symbolic expression to skip argument casts on function call.
+  // This is useful when there is no way for overloading function in C
+  // but we need to pass different types of arguments and
+  // implicit cast occures.
+  Sym = Sym->ignoreCasts();
+
----------------
Does it really matter? I mean, why do we need this change?


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2212
+
+bool ConstraintAssignor::assignSymExprToRangeSet(const SymExpr *Sym,
+                                                 RangeSet Constraint) {
----------------
Could you please rebase? The "simplification" code part had been merged already 
to llvm/main and it is not part of this change.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2227-2251
+  llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;
+  // Iterate over all equivalence classes and try to simplify them.
+  ClassMembersTy Members = State->get<ClassMembers>();
+  for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {
+    EquivalenceClass Class = ClassToSymbolSet.first;
+    State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
+    if (!State)
----------------
I think this hunk should remain in `assignSymExprToConst`. Why did you move it?


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2280-2290
+/// Return a symbol which is the best canidate to save it in the constraint
+/// map. We should correct symbol because in case of truncation cast we can
+/// only reason about truncated bytes but not the whole value. E.g. (char)(int
+/// x), we can store constraints for the first lower byte but we still don't
+/// know the root value. Also in case of promotion or converion we should
+/// store the root value instead of cast symbol, because we can always get
+/// a correct range using `castTo` metho. And we are not intrested in any
----------------
I think, we should definitely store the constraints as they appear in the 
analyzed code. This way we mix the infer logic into the constraint setting, 
which is bad.
I mean, we should simply store the constraint directly for the symbol as it is. 
And then only in `VisitSymbolCast` should we infer the proper value from the 
stored constraint (if we can).

(Of course, if we have related symbols (casts of the original symbol) then 
their constraints must be updated.)


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2281
+/// Return a symbol which is the best canidate to save it in the constraint
+/// map. We should correct symbol because in case of truncation cast we can
+/// only reason about truncated bytes but not the whole value. E.g. (char)(int
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2284
+/// x), we can store constraints for the first lower byte but we still don't
+/// know the root value. Also in case of promotion or converion we should
+/// store the root value instead of cast symbol, because we can always get
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2285
+/// know the root value. Also in case of promotion or converion we should
+/// store the root value instead of cast symbol, because we can always get
+/// a correct range using `castTo` metho. And we are not intrested in any
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2286
+/// store the root value instead of cast symbol, because we can always get
+/// a correct range using `castTo` metho. And we are not intrested in any
+/// constraints of cast symbol but the root symbol in `if` expression
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2322
+  if (IsTruncated) {
+    // Trancation occurred. High bits lost. We can't reason about ranges of
+    // the original(root) operand in this case, so we should not add it to the
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2346-2350
+/// FIXME: Update bigger casts. We only can reason about ranges of smaller
+/// types, because it would be too complicated to update, say, the entire
+/// `int` range if you only have knowledge that its lowest byte has been
+/// changed. So we don't touch bigger casts and they may be potentially
+/// invalid. For future, for:
----------------
Instead of a noop we should be more conservative in this case. We should 
invalidate (remove) the constraints of all the symbols that have more bits than 
the currently set symbol. However, we might be clever in cases of special 
values (e.g `0` or in case of the `true` rangeSet {[MIN, -1], [1, MAX]}).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103096/new/

https://reviews.llvm.org/D103096

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to