Szelethus created this revision.
Szelethus added reviewers: NoQ, steakhal, xazax.hun, martong, ASDenysPetrov.
Herald added subscribers: manas, gamesh411, dkrupp, donat.nagy, 
mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware.
Szelethus requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

As of now, we have two interfaces to for defining signatures: 
`StdLibraryFunctionsChecker::Signature` and `CallDescription`. An example for 
how `Signature`s are used can be seen here:

  addToFunctionSummaryMap(
      "isprint", Signature(ArgTypes{IntTy}, RetType{IntTy}),
      Summary(EvalCallAsPure)
          .Case({ArgumentCondition(0U, WithinRange, Range(32, 126)),
                 ReturnValueCondition(OutOfRange, SingleValue(0))})
          .Case({ArgumentCondition(0U, OutOfRange, Range(32, 126)),
                 ReturnValueCondition(WithinRange, SingleValue(0))}));

The name of the function is searched for in translation unit's identifier 
table, then the `Signature` is matched against each decl `Decl` with the same 
name. Ideally, this yields a `FunctionDecl`, which is mapped to its `Summary`.

This works well for C functions, but doesn't support C++ at all.

`CallDescription` emerged with a strong emphasis on recognizing C++ functions. 
The common example brough up is `std::string`, which in some standard library 
implementations is actually called something like `std::__cxx11::basic_string`, 
but not in others. Matching this can be a nightmare for checker developers. For 
this reason, `CallDescription`s can be defined like this:

  
  InnerPointerChecker()
      : AppendFn({"std", "basic_string", "append"}),
        AssignFn({"std", "basic_string", "assign"}),
        AddressofFn({"std", "addressof"}),
        ClearFn({"std", "basic_string", "clear"}),
        CStrFn({"std", "basic_string", "c_str"}), DataFn({"std", "data"}, 1),
        DataMemberFn({"std", "basic_string", "data"}),
        EraseFn({"std", "basic_string", "erase"}),
        InsertFn({"std", "basic_string", "insert"}),
        PopBackFn({"std", "basic_string", "pop_back"}),
        PushBackFn({"std", "basic_string", "push_back"}),
        ReplaceFn({"std", "basic_string", "replace"}),
        ReserveFn({"std", "basic_string", "reserve"}),
        ResizeFn({"std", "basic_string", "resize"}),
        ShrinkToFitFn({"std", "basic_string", "shrink_to_fit"}),
        SwapFn({"std", "basic_string", "swap"}) {}

Any identifier which matches //at least// these identifiers are considered a 
match (which sometimes leads to incorrect matching, e.g. D81745 
<https://reviews.llvm.org/D81745>).

`CallDescription`s are (usually) not used for digging up `FunctionDecl`s from 
the translation unit, but rather during symbolic execution to check in a 
pre/post call event whether the called function matches the `CallDescription`:

  bool InnerPointerChecker::isInnerPointerAccessFunction(
      const CallEvent &Call) const {
    return matchesAny(Call, CStrFn, DataFn, DataMemberFn);
  }

Most of the new checkers implementing pre/post condition checks on functions 
now use `CallDescriptionMap` or `CallDescriptionSet`. Its up to debate whether 
the newer `Signature` approach is better, but its not obvious, and converting 
from one to the other may be non-trivial as well.

---

Now, onto this patch. Since `CallDescription`s can only be matched against 
`CallEvent`s that are created during symbolic execution, it was not possible to 
use it in syntactic-only contexts. For example, even though 
`InnerPointerChecker` can check with its set of `CallDescription`s whether a 
function call is interested during analysis, its unable to check without hassle 
whether a non-analyzer piece of code also calls such a function.

The patch adds the ability to use `CallDescription`s in syntactic contexts as 
well. While we already have that in `Signature`, we still want to leverage the 
ability to use dynamic information when we have it (function pointers, for 
example). This could be done with `Signature` as well 
(`StdLibraryFunctionsChecker` does it), but it makes it even less of a drop-in 
replacement.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D119004

Files:
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
  clang/lib/StaticAnalyzer/Core/CallDescription.cpp
  clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp

Index: clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
===================================================================
--- clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
+++ clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
@@ -6,11 +6,18 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "CheckerRegistration.h"
 #include "Reusables.h"
 
 #include "clang/AST/ExprCXX.h"
+#include "clang/Analysis/PathDiagnostic.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
 #include "clang/Tooling/Tooling.h"
 #include "gtest/gtest.h"
 #include <type_traits>
@@ -40,6 +47,7 @@
     // If it's a function we expected to find, remember that we've found it.
     if (Result && *Result)
       ++Found;
+
     return Result;
   }
 
@@ -489,6 +497,77 @@
       "}"));
 }
 
+//===----------------------------------------------------------------------===//
+// Testing through a checker interface.
+//
+// Above, the static analyzer isn't run properly, only the bare minimum to
+// create CallEvents. This causes CallEvents through function pointers to not
+// refer to the pointee function, but this works fine if we run
+// AnalysisASTConsumer.
+//===----------------------------------------------------------------------===//
+
+class CallDescChecker
+    : public Checker<check::PreCall, check::PreStmt<CallExpr>> {
+  CallDescriptionSet Set = {{"bar", 0}};
+
+public:
+  void checkPreCall(const CallEvent &Call, CheckerContext &C) const {
+    if (Set.contains(Call)) {
+      C.getBugReporter().EmitBasicReport(
+          Call.getDecl(), this, "CallEvent match", categories::LogicError,
+          "CallEvent match",
+          PathDiagnosticLocation{Call.getDecl(), C.getSourceManager()});
+    }
+  }
+
+  void checkPreStmt(const CallExpr *CE, CheckerContext &C) const {
+    if (Set.containsImprecise(*CE)) {
+      C.getBugReporter().EmitBasicReport(
+          CE->getCalleeDecl(), this, "CallExpr match", categories::LogicError,
+          "CallExpr match",
+          PathDiagnosticLocation{CE->getCalleeDecl(), C.getSourceManager()});
+    }
+  }
+};
+
+void addCallDescChecker(AnalysisASTConsumer &AnalysisConsumer,
+                        AnalyzerOptions &AnOpts) {
+  AnOpts.CheckersAndPackages = {{"test.CallDescChecker", true}};
+  AnalysisConsumer.AddCheckerRegistrationFn([](CheckerRegistry &Registry) {
+    Registry.addChecker<CallDescChecker>("test.CallDescChecker", "Description",
+                                         "");
+  });
+}
+
+TEST(CallDescription, CheckCallExprMatching) {
+  // Imprecise matching shouldn't catch the call to bar, because its obscured
+  // by a function pointer.
+  constexpr StringRef FnPtrCode = R"code(
+    void bar();
+    void foo() {
+      void (*fnptr)() = bar;
+      fnptr();
+    })code";
+  std::string Diags;
+  EXPECT_TRUE(runCheckerOnCode<addCallDescChecker>(FnPtrCode.str(), Diags,
+                                                   /*OnlyEmitWarnings*/ true));
+  EXPECT_EQ("test.CallDescChecker: CallEvent match\n", Diags);
+
+  // This should be caught properly by imprecise matching, as the call is done
+  // purely through syntactic means.
+  constexpr StringRef Code = R"code(
+    void bar();
+    void foo() {
+      bar();
+    })code";
+  Diags.clear();
+  EXPECT_TRUE(runCheckerOnCode<addCallDescChecker>(Code.str(), Diags,
+                                                   /*OnlyEmitWarnings*/ true));
+  EXPECT_EQ("test.CallDescChecker: CallEvent match\n"
+            "test.CallDescChecker: CallExpr match\n",
+            Diags);
+}
+
 } // namespace
 } // namespace ento
 } // namespace clang
Index: clang/lib/StaticAnalyzer/Core/CallDescription.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Core/CallDescription.cpp
+++ clang/lib/StaticAnalyzer/Core/CallDescription.cpp
@@ -13,6 +13,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
+#include "clang/AST/Decl.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -61,15 +62,32 @@
   if (!FD)
     return false;
 
+  return matchesImpl(FD, Call.getNumArgs(), Call.parameters().size());
+}
+
+bool ento::CallDescription::matchesImprecise(const CallExpr &CE) const {
+  const auto *FD = dyn_cast_or_null<FunctionDecl>(CE.getCalleeDecl());
+  if (!FD)
+    return false;
+
+  return matchesImpl(FD, CE.getNumArgs(), FD->param_size());
+}
+
+bool ento::CallDescription::matchesImpl(const FunctionDecl *Callee,
+                                        size_t ArgCount,
+                                        size_t ParamCount) const {
+  const auto *FD = Callee;
+  if (!FD)
+    return false;
+
   if (Flags & CDF_MaybeBuiltin) {
     return CheckerContext::isCLibraryFunction(FD, getFunctionName()) &&
-           (!RequiredArgs || *RequiredArgs <= Call.getNumArgs()) &&
-           (!RequiredParams || *RequiredParams <= Call.parameters().size());
+           (!RequiredArgs || *RequiredArgs <= ArgCount) &&
+           (!RequiredParams || *RequiredParams <= ParamCount);
   }
 
   if (!II.hasValue()) {
-    II = &Call.getState()->getStateManager().getContext().Idents.get(
-        getFunctionName());
+    II = &FD->getASTContext().Idents.get(getFunctionName());
   }
 
   const auto MatchNameOnly = [](const CallDescription &CD,
@@ -86,11 +104,11 @@
   };
 
   const auto ExactMatchArgAndParamCounts =
-      [](const CallEvent &Call, const CallDescription &CD) -> bool {
-    const bool ArgsMatch =
-        !CD.RequiredArgs || *CD.RequiredArgs == Call.getNumArgs();
+      [](size_t ArgCount, size_t ParamCount,
+         const CallDescription &CD) -> bool {
+    const bool ArgsMatch = !CD.RequiredArgs || *CD.RequiredArgs == ArgCount;
     const bool ParamsMatch =
-        !CD.RequiredParams || *CD.RequiredParams == Call.parameters().size();
+        !CD.RequiredParams || *CD.RequiredParams == ParamCount;
     return ArgsMatch && ParamsMatch;
   };
 
@@ -122,7 +140,7 @@
   };
 
   // Let's start matching...
-  if (!ExactMatchArgAndParamCounts(Call, *this))
+  if (!ExactMatchArgAndParamCounts(ArgCount, ParamCount, *this))
     return false;
 
   if (!MatchNameOnly(*this, FD))
@@ -144,3 +162,7 @@
 bool ento::CallDescriptionSet::contains(const CallEvent &Call) const {
   return static_cast<bool>(Impl.lookup(Call));
 }
+
+bool ento::CallDescriptionSet::containsImprecise(const CallExpr &CE) const {
+  return static_cast<bool>(Impl.lookupImprecise(CE));
+}
Index: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
===================================================================
--- clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
+++ clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
@@ -99,6 +99,28 @@
   /// calls.
   bool matches(const CallEvent &Call) const;
 
+  /// Returns true if the CallEvent is a call to a function that matches
+  /// the CallDescription.
+  ///
+  /// When available, always prefer matching with a CallEvent! This function
+  /// exists only when that is not available, for example, when _only_
+  /// syntactic check is done on a piece of code.
+  ///
+  /// Also, StdLibraryFunctionsChecker::Signature is likely a better candicade
+  /// for syntactic only matching if you are writing a new checker. This is
+  /// handy if a CallDescriptionMap is already there.
+  ///
+  /// The function is imprecise because CallEvent understands the precise
+  /// argument count better (see comments for CallEvent::getNumArgs), may
+  /// know the called function if it was called through a function pointer,
+  /// and other information not available syntactically.
+  bool matchesImprecise(const CallExpr &CE) const;
+
+private:
+  bool matchesImpl(const FunctionDecl *Callee, size_t ArgCount,
+                   size_t ParamCount) const;
+
+public:
   /// Returns true whether the CallEvent matches on any of the CallDescriptions
   /// supplied.
   ///
@@ -156,6 +178,18 @@
 
     return nullptr;
   }
+
+  /// ALWAYS prefer lookup with a CallEvent, when available. See comments above
+  /// CallDescription::matchesImprecise.
+  LLVM_NODISCARD const T *lookupImprecise(const CallExpr &Call) const {
+    // Slow path: linear lookup.
+    // TODO: Implement some sort of fast path.
+    for (const std::pair<CallDescription, T> &I : LinearMap)
+      if (I.first.matchesImprecise(Call))
+        return &I.second;
+
+    return nullptr;
+  }
 };
 
 /// An immutable set of CallDescriptions.
@@ -171,6 +205,7 @@
   CallDescriptionSet &operator=(const CallDescription &) = delete;
 
   LLVM_NODISCARD bool contains(const CallEvent &Call) const;
+  LLVM_NODISCARD bool containsImprecise(const CallExpr &CE) const;
 };
 
 } // namespace ento
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to