steakhal created this revision.
steakhal added reviewers: NoQ, vsavchenko, xazax.hun, martong, Szelethus.
Herald added subscribers: cfe-commits, ASDenysPetrov, Charusso, dkrupp, 
donat.nagy, mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware, 
whisperity.
Herald added a project: clang.
steakhal requested review of this revision.

The `SymbolMetadata` handing was slightly flawed.
Given the following example:

  void strcat_models_dst_length_even_if_src_dies(char *src) {
    char dst[8] = "1234";
    strcat(dst, src);
    clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}
  }

A metadata symbol was assigned to the cstring length map, representing the new 
length of the `dst`:
'`dst` -> `meta{src} + 4`' where `meta{src}` represents the cstring length of 
the region of `src`.
Unfortunately, this information was immediately removed from the model, since 
the `src` become dead after the evaluation of calling `strcat`.
This results in the removal of all symbolic expressions (eg metadata symbols) 
that refers to the dead `src` memory region.

As of now, `MetadataSymbol`s are used only by the `CStringChecker`, so this fix 
is tightly coupled to fixing the resulting issues of that checker.
There was already a FIXME in the test suite for this - which is now resolved.

---

There was a lengthy discussion on the mailing list 
<http://lists.llvm.org/pipermail/cfe-dev/2020-August/066559.html> how to 
resolve this issue.
In that mailing, @NoQ proposed these steps, which I have implemented in this 
patch:

> 1. Remove statement, location context, block count from SymbolMetadata's 
> identity. Let it solely depend on the region.
> 2. Get rid of the metadata-in-use set. From now on SymbolMetadata, like 
> SymbolRegionValue, is live whenever its region is live.
> 3. When strlen(R) is used for the first time on a region R, produce 
> `$meta<size_t R>` as the answer, but *do not store* it in the string length 
> map. We don't need to change the state because the state didn't  in fact 
> change. On subsequent `strlen(R)` calls we'll simply return the same symbol 
> (**because** the map will remain empty), until the string is changed.
> 4. If the string is mutated, record its length as usual. But if the string is 
> invalidated (or the new length is completely unknown), **do not remove** the 
> length from the state; instead, set it to SymbolConjured. Only remove it from 
> the map when the region dies.
> 5. Keep string length symbols alive as long as they're in the map.




Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D86445

Files:
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
  clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
  clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
  clang/test/Analysis/bsd-string.c
  clang/test/Analysis/string.c

Index: clang/test/Analysis/string.c
===================================================================
--- clang/test/Analysis/string.c
+++ clang/test/Analysis/string.c
@@ -502,6 +502,11 @@
 	strcat(dst, src);
 }
 
+void strcat_models_dst_length_even_if_src_dies(char *src) {
+  char dst[8] = "1234";
+  strcat(dst, src);
+  clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}
+}
 
 //===----------------------------------------------------------------------===
 // strncpy()
@@ -1515,23 +1520,12 @@
   free(privkey);
 }
 
-//===----------------------------------------------------------------------===
-// FIXMEs
-//===----------------------------------------------------------------------===
-
-// The analyzer_eval call below should evaluate to true. We are being too
-// aggressive in marking the (length of) src symbol dead. The length of dst
-// depends on src. This could be explicitly specified in the checker or the
-// logic for handling MetadataSymbol in SymbolManager needs to change.
 void strcat_symbolic_src_length(char *src) {
-	char dst[8] = "1234";
-	strcat(dst, src);
-  clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{UNKNOWN}}
+  char dst[8] = "1234";
+  strcat(dst, src);
+  clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}
 }
 
-
-// The analyzer_eval call below should evaluate to true. Most likely the same
-// issue as the test above.
 void strncpy_exactly_matching_buffer2(char *y) {
 	if (strlen(y) >= 4)
 		return;
@@ -1540,9 +1534,13 @@
 	strncpy(x, y, 4); // no-warning
 
 	// This time, we know that y fits in x anyway.
-  clang_analyzer_eval(strlen(x) <= 3); // expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(x) <= 3); // expected-warning{{TRUE}}
 }
 
+//===----------------------------------------------------------------------===
+// FIXMEs
+//===----------------------------------------------------------------------===
+
 void memset7_char_array_nonnull() {
   char str[5] = "abcd";
   clang_analyzer_eval(strlen(str) == 4); // expected-warning{{TRUE}}
Index: clang/test/Analysis/bsd-string.c
===================================================================
--- clang/test/Analysis/bsd-string.c
+++ clang/test/Analysis/bsd-string.c
@@ -100,6 +100,12 @@
   len = strlcpy(unknown_dst,"abbc",unknown_size);
   clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
   clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+}
+
+// Continue the f9 test, but with a fresh 'buf' variable.
+void f9_extra(int unknown_size, char *unknown_src, char *unknown_dst) {
+  char buf[8];
+  size_t len;
 
   //src is unknown
   len = strlcpy(buf,unknown_src, sizeof(buf));
Index: clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
+++ clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
@@ -214,17 +214,16 @@
   return cast<SymbolExtent>(SD);
 }
 
-const SymbolMetadata *
-SymbolManager::getMetadataSymbol(const MemRegion* R, const Stmt *S, QualType T,
-                                 const LocationContext *LCtx,
-                                 unsigned Count, const void *SymbolTag) {
+const SymbolMetadata *SymbolManager::getMetadataSymbol(const MemRegion *R,
+                                                       QualType T,
+                                                       const void *SymbolTag) {
   llvm::FoldingSetNodeID profile;
-  SymbolMetadata::Profile(profile, R, S, T, LCtx, Count, SymbolTag);
+  SymbolMetadata::Profile(profile, R, T, SymbolTag);
   void *InsertPos;
   SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
   if (!SD) {
     SD = (SymExpr*) BPAlloc.Allocate<SymbolMetadata>();
-    new (SD) SymbolMetadata(SymbolCounter, R, S, T, LCtx, Count, SymbolTag);
+    new (SD) SymbolMetadata(SymbolCounter, R, T, SymbolTag);
     DataSet.InsertNode(SD, InsertPos);
     ++SymbolCounter;
   }
@@ -393,11 +392,6 @@
   }
 }
 
-void SymbolReaper::markInUse(SymbolRef sym) {
-  if (isa<SymbolMetadata>(sym))
-    MetadataInUse.insert(sym);
-}
-
 bool SymbolReaper::isLiveRegion(const MemRegion *MR) {
   // TODO: For now, liveness of a memory region is equivalent to liveness of its
   // base region. In fact we can do a bit better: say, if a particular FieldDecl
@@ -455,10 +449,7 @@
     KnownLive = isLiveRegion(cast<SymbolExtent>(sym)->getRegion());
     break;
   case SymExpr::SymbolMetadataKind:
-    KnownLive = MetadataInUse.count(sym) &&
-                isLiveRegion(cast<SymbolMetadata>(sym)->getRegion());
-    if (KnownLive)
-      MetadataInUse.erase(sym);
+    KnownLive = isLiveRegion(cast<SymbolMetadata>(sym)->getRegion());
     break;
   case SymExpr::SymIntExprKind:
     KnownLive = isLive(cast<SymIntExpr>(sym)->getLHS());
Index: clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
+++ clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
@@ -203,13 +203,10 @@
 
 DefinedSVal SValBuilder::getMetadataSymbolVal(const void *symbolTag,
                                               const MemRegion *region,
-                                              const Expr *expr, QualType type,
-                                              const LocationContext *LCtx,
-                                              unsigned count) {
+                                              QualType type) {
   assert(SymbolManager::canSymbolicate(type) && "Invalid metadata symbol type");
 
-  SymbolRef sym =
-      SymMgr.getMetadataSymbol(region, expr, type, LCtx, count, symbolTag);
+  SymbolRef sym = SymMgr.getMetadataSymbol(region, type, symbolTag);
 
   if (Loc::isLocType(type))
     return loc::MemRegionVal(MemMgr.getSymbolicRegion(sym));
Index: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -750,10 +750,8 @@
   // Otherwise, get a new symbol and update the state.
   SValBuilder &svalBuilder = C.getSValBuilder();
   QualType sizeTy = svalBuilder.getContext().getSizeType();
-  SVal strLength = svalBuilder.getMetadataSymbolVal(CStringChecker::getTag(),
-                                                    MR, Ex, sizeTy,
-                                                    C.getLocationContext(),
-                                                    C.blockCount());
+  SVal strLength =
+      svalBuilder.getMetadataSymbolVal(CStringChecker::getTag(), MR, sizeTy);
 
   if (!hypothetical) {
     if (Optional<NonLoc> strLn = strLength.getAs<NonLoc>()) {
@@ -1098,8 +1096,7 @@
                                svalBuilder.makeZeroVal(Ctx.getSizeType()));
     } else if (!StateNullChar && StateNonNullChar) {
       SVal NewStrLen = svalBuilder.getMetadataSymbolVal(
-          CStringChecker::getTag(), MR, DstBuffer, Ctx.getSizeType(),
-          C.getLocationContext(), C.blockCount());
+          CStringChecker::getTag(), MR, Ctx.getSizeType());
 
       // If the value of second argument is not zero, then the string length
       // is at least the size argument.
@@ -2378,14 +2375,14 @@
 
   CStringLengthTy::Factory &F = state->get_context<CStringLength>();
 
-  // Then loop over the entries in the current state.
-  for (CStringLengthTy::iterator I = Entries.begin(),
-      E = Entries.end(); I != E; ++I) {
-    const MemRegion *MR = I.getKey();
+  // Overwrite the associated cstring length values of the invalidated regions.
+  for (const auto &Entry : Entries) {
+    const MemRegion *MR = Entry.first;
 
     // Is this entry for a super-region of a changed region?
     if (SuperRegions.count(MR)) {
       Entries = F.remove(Entries, MR);
+      Entries = F.add(Entries, MR, UnknownVal());
       continue;
     }
 
@@ -2395,6 +2392,7 @@
       Super = SR->getSuperRegion();
       if (Invalidated.count(Super)) {
         Entries = F.remove(Entries, MR);
+        Entries = F.add(Entries, MR, UnknownVal());
         break;
       }
     }
@@ -2406,37 +2404,35 @@
 void CStringChecker::checkLiveSymbols(ProgramStateRef state,
     SymbolReaper &SR) const {
   // Mark all symbols in our string length map as valid.
-  CStringLengthTy Entries = state->get<CStringLength>();
-
-  for (CStringLengthTy::iterator I = Entries.begin(), E = Entries.end();
-      I != E; ++I) {
-    SVal Len = I.getData();
-
-    for (SymExpr::symbol_iterator si = Len.symbol_begin(),
-        se = Len.symbol_end(); si != se; ++si)
-      SR.markInUse(*si);
+  for (const auto &Entry : state->get<CStringLength>()) {
+    SVal AssociatedLength = Entry.second;
+    const auto SymbolRange = llvm::make_range(AssociatedLength.symbol_begin(),
+                                              AssociatedLength.symbol_end());
+    for (SymbolRef SubSym : SymbolRange)
+      if (isa<SymbolData>(SubSym))
+        SR.markLive(SubSym);
   }
 }
 
 void CStringChecker::checkDeadSymbols(SymbolReaper &SR,
     CheckerContext &C) const {
-  ProgramStateRef state = C.getState();
-  CStringLengthTy Entries = state->get<CStringLength>();
+  ProgramStateRef State = C.getState();
+  CStringLengthTy Entries = State->get<CStringLength>();
   if (Entries.isEmpty())
     return;
 
-  CStringLengthTy::Factory &F = state->get_context<CStringLength>();
-  for (CStringLengthTy::iterator I = Entries.begin(), E = Entries.end();
-      I != E; ++I) {
-    SVal Len = I.getData();
+  CStringLengthTy::Factory &F = State->get_context<CStringLength>();
+  for (const auto &Entry : Entries) {
+    const MemRegion *MR = Entry.first;
+    SVal Len = Entry.second;
     if (SymbolRef Sym = Len.getAsSymbol()) {
       if (SR.isDead(Sym))
-        Entries = F.remove(Entries, I.getKey());
+        Entries = F.remove(Entries, MR);
     }
   }
 
-  state = state->set<CStringLength>(Entries);
-  C.addTransition(state);
+  State = State->set<CStringLength>(Entries);
+  C.addTransition(State);
 }
 
 void ento::registerCStringModeling(CheckerManager &Mgr) {
Index: clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
===================================================================
--- clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
+++ clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
@@ -200,48 +200,33 @@
 ///  Intended for use by checkers.
 class SymbolMetadata : public SymbolData {
   const MemRegion* R;
-  const Stmt *S;
   QualType T;
-  const LocationContext *LCtx;
-  unsigned Count;
   const void *Tag;
 
 public:
-  SymbolMetadata(SymbolID sym, const MemRegion* r, const Stmt *s, QualType t,
-                 const LocationContext *LCtx, unsigned count, const void *tag)
-      : SymbolData(SymbolMetadataKind, sym), R(r), S(s), T(t), LCtx(LCtx),
-        Count(count), Tag(tag) {
-      assert(r);
-      assert(s);
-      assert(isValidTypeForSymbol(t));
-      assert(LCtx);
-      assert(tag);
-    }
+  SymbolMetadata(SymbolID sym, const MemRegion *r, QualType t, const void *tag)
+      : SymbolData(SymbolMetadataKind, sym), R(r), T(t), Tag(tag) {
+    assert(isValidTypeForSymbol(t));
+    assert(tag);
+  }
 
   const MemRegion *getRegion() const { return R; }
-  const Stmt *getStmt() const { return S; }
-  const LocationContext *getLocationContext() const { return LCtx; }
-  unsigned getCount() const { return Count; }
   const void *getTag() const { return Tag; }
 
   QualType getType() const override;
 
   void dumpToStream(raw_ostream &os) const override;
 
-  static void Profile(llvm::FoldingSetNodeID& profile, const MemRegion *R,
-                      const Stmt *S, QualType T, const LocationContext *LCtx,
-                      unsigned Count, const void *Tag) {
+  static void Profile(llvm::FoldingSetNodeID &profile, const MemRegion *R,
+                      QualType T, const void *Tag) {
     profile.AddInteger((unsigned) SymbolMetadataKind);
     profile.AddPointer(R);
-    profile.AddPointer(S);
     profile.Add(T);
-    profile.AddPointer(LCtx);
-    profile.AddInteger(Count);
     profile.AddPointer(Tag);
   }
 
   void Profile(llvm::FoldingSetNodeID& profile) override {
-    Profile(profile, R, S, T, LCtx, Count, Tag);
+    Profile(profile, R, T, Tag);
   }
 
   // Implement isa<T> support.
@@ -452,10 +437,7 @@
   ///
   /// VisitCount can be used to differentiate regions corresponding to
   /// different loop iterations, thus, making the symbol path-dependent.
-  const SymbolMetadata *getMetadataSymbol(const MemRegion *R, const Stmt *S,
-                                          QualType T,
-                                          const LocationContext *LCtx,
-                                          unsigned VisitCount,
+  const SymbolMetadata *getMetadataSymbol(const MemRegion *R, QualType T,
                                           const void *SymbolTag = nullptr);
 
   const SymbolCast* getCastSymbol(const SymExpr *Operand,
@@ -503,7 +485,6 @@
   using RegionSetTy = llvm::DenseSet<const MemRegion *>;
 
   SymbolMapTy TheLiving;
-  SymbolSetTy MetadataInUse;
 
   RegionSetTy RegionRoots;
 
@@ -537,6 +518,7 @@
   /// This should never be
   /// used by checkers, only by the state infrastructure such as the store and
   /// environment. Checkers should instead use metadata symbols and markInUse.
+  /// TODO: update this comment!!!
   void markLive(SymbolRef sym);
 
   /// Marks a symbol as important to a checker.
@@ -546,7 +528,8 @@
   /// live. For other symbols, this has no effect; checkers are not permitted
   /// to influence the life of other symbols. This should be used before any
   /// symbol marking has occurred, i.e. in the MarkLiveSymbols callback.
-  void markInUse(SymbolRef sym);
+  // TODO: remove this function!!!
+  // void markInUse(SymbolRef sym);
 
   using region_iterator = RegionSetTy::const_iterator;
 
Index: clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
===================================================================
--- clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
+++ clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
@@ -228,10 +228,7 @@
       SymbolRef parentSymbol, const TypedValueRegion *region);
 
   DefinedSVal getMetadataSymbolVal(const void *symbolTag,
-                                   const MemRegion *region,
-                                   const Expr *expr, QualType type,
-                                   const LocationContext *LCtx,
-                                   unsigned count);
+                                   const MemRegion *region, QualType type);
 
   DefinedSVal getMemberPointer(const NamedDecl *ND);
 
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to