NoQ created this revision.
NoQ added reviewers: aaron.ballman, gribozavr2, xazax.hun, jkorous, t-rasmud, 
ziqingluo-90, malavikasamak.
Herald added subscribers: steakhal, martong, rnkovacs.
Herald added a project: All.
NoQ requested review of this revision.

This patch adds more abstractions that we'll need later for emitting 
`-Wunsafe-buffer-usage` fixits. It doesn't emit any actual fixits, so no change 
is observed behavior, but it introduces a way to emit fixits, and existing 
tests now verify that the compiler still emits no fixits, despite knowing how 
to do so.

The purpose of our code transformation analysis is to fix variable types in the 
code from raw pointer types to C++ standard collection/view types.

The analysis has to decide on its own which specific type is the most 
appropriate for every variable. This patch introduces the `Strategy` class that 
maps variables to their most appropriate types.

In D137348 <https://reviews.llvm.org/D137348> we've introduced the `Gadget` 
abstraction, which describes a rigid AST pattern that the analysis "fully 
understands" and may need to fix. Which specific fix is actually necessary for 
a given `Gadget`, and whether it's //necessary// at all, and whether it's 
//possible// in the first place, depends on the `Strategy`. So, this patch adds 
a virtual method which every gadget can implement in order to teach the 
analysis how to fix that gadget:

  Gadget->getFixits(Strategy)

However, even if the analysis knows how to fix every `Gadget`, doesn't 
necessarily mean it can fix the variable. Some uses of the variable may have 
never been covered by `Gadget`s, which corresponds to the situation that the 
analysis doesn't fully understand how the variable is used. This patch 
introduces a `Tracker` class that tracks all variable uses (i.e. 
`DeclRefExpr`s) in the function. Additionally, each `Gadget` now provides a new 
virtual method

  Gadget->getClaimedVarUseSites()

that the `Tracker` can call to see which `DeclRefExpr`s are "claimed" by the 
`Gadget`. In order to fix the variable with a certain `Strategy`, the `Tracker` 
needs to confirm that there are no unclaimed uses, and every `Gadget` has to 
provide a fix for that `Strategy`.

This "conservative" behavior guarantees that fixes emitted by our analysis are 
correct by construction. We can now be sure that the analysis won't attempt to 
emit a fix if it doesn't understand the code. Later, as we implement more 
`getFixits()` methods in individual `Gadget` classes, we'll start progressively 
emitting more and more fixits.


Repository:
  rC Clang

https://reviews.llvm.org/D138253

Files:
  clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
  clang/include/clang/Basic/DiagnosticGroups.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/Analysis/UnsafeBufferUsage.cpp
  clang/lib/Sema/AnalysisBasedWarnings.cpp

Index: clang/lib/Sema/AnalysisBasedWarnings.cpp
===================================================================
--- clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -2150,7 +2150,17 @@
   UnsafeBufferUsageReporter(Sema &S) : S(S) {}
 
   void handleUnsafeOperation(const Stmt *Operation) override {
-    S.Diag(Operation->getBeginLoc(), diag::warn_unsafe_buffer_usage);
+    S.Diag(Operation->getBeginLoc(), diag::warn_unsafe_buffer_expression);
+  }
+
+  void handleFixableVariable(const VarDecl *Variable,
+                             FixItList &&Fixes) override {
+    const auto &D =
+        S.Diag(Variable->getBeginLoc(), diag::warn_unsafe_buffer_variable);
+    D << Variable;
+    for (const auto &F: Fixes) {
+      D << F;
+    }
   }
 };
 
@@ -2448,7 +2458,8 @@
         checkThrowInNonThrowingFunc(S, FD, AC);
 
   // Emit unsafe buffer usage warnings and fixits.
-  if (!Diags.isIgnored(diag::warn_unsafe_buffer_usage, D->getBeginLoc())) {
+  if (!Diags.isIgnored(diag::warn_unsafe_buffer_expression, D->getBeginLoc()) ||
+      !Diags.isIgnored(diag::warn_unsafe_buffer_variable, D->getBeginLoc())) {
     UnsafeBufferUsageReporter R(S);
     checkUnsafeBufferUsage(D, R);
   }
Index: clang/lib/Analysis/UnsafeBufferUsage.cpp
===================================================================
--- clang/lib/Analysis/UnsafeBufferUsage.cpp
+++ clang/lib/Analysis/UnsafeBufferUsage.cpp
@@ -13,6 +13,18 @@
 using namespace clang;
 using namespace ast_matchers;
 
+namespace {
+// Because the analysis revolves around variables and their types, we'll need to
+// track uses of variables (aka DeclRefExprs).
+using DeclUseList = SmallVector<const DeclRefExpr *, 1>;
+
+// Convenience typedef.
+using FixItList = UnsafeBufferUsageHandler::FixItList;
+
+// Defined below.
+class Strategy;
+} // namespace
+
 // Because we're dealing with raw pointers, let's define what we mean by that.
 static auto hasPointerType() {
   return anyOf(hasType(pointerType()),
@@ -67,6 +79,18 @@
   virtual bool isSafe() const = 0;
   virtual const Stmt *getBaseStmt() const = 0;
 
+  /// Returns the list of pointer-type variables on which this gadget performs
+  /// its operation. Typically there's only one variable. This isn't a list
+  /// of all DeclRefExprs in the gadget's AST!
+  virtual DeclUseList getClaimedVarUseSites() const = 0;
+
+  /// Returns a fixit that would fix the current gadget according to
+  /// the current strategy. Returns None if the fix cannot be produced;
+  /// returns an empty list if no fixes are necessary.
+  virtual Optional<FixItList> getFixits(const Strategy &) const {
+    return None;
+  }
+
   virtual ~Gadget() {}
 
 private:
@@ -124,6 +148,15 @@
   }
 
   const Stmt *getBaseStmt() const override { return Op; }
+
+  DeclUseList getClaimedVarUseSites() const override {
+    if (const auto *DRE =
+            dyn_cast<DeclRefExpr>(Op->getSubExpr()->IgnoreParenImpCasts())) {
+      return {DRE};
+    }
+
+    return {};
+  }
 };
 
 /// A decrement of a pointer-type value is unsafe as it may run the pointer
@@ -148,6 +181,15 @@
   }
 
   const Stmt *getBaseStmt() const override { return Op; }
+
+  DeclUseList getClaimedVarUseSites() const override {
+    if (const auto *DRE =
+            dyn_cast<DeclRefExpr>(Op->getSubExpr()->IgnoreParenImpCasts())) {
+      return {DRE};
+    }
+
+    return {};
+  }
 };
 
 /// Array subscript expressions on raw pointers as if they're arrays. Unsafe as
@@ -173,25 +215,131 @@
   }
 
   const Stmt *getBaseStmt() const override { return ASE; }
+
+  DeclUseList getClaimedVarUseSites() const override {
+    if (const auto *DRE =
+            dyn_cast<DeclRefExpr>(ASE->getBase()->IgnoreParenImpCasts())) {
+      return {DRE};
+    }
+
+    return {};
+  }
 };
 } // namespace
 
-// Scan the function and return a list of gadgets found with provided kits.
-static GadgetList findGadgets(const Decl *D) {
+namespace {
+// An auxiliary tracking facility for the fixit analysis. It helps connect
+// declarations to its and make sure we've covered all uses with our analysis
+// before we try to fix the declaration.
+class DeclUseTracker {
+  using UseSetTy = SmallSet<const DeclRefExpr *, 16>;
+  using DefMapTy = DenseMap<const VarDecl *, const DeclStmt *>;
+
+  // Allocate on the heap for easier move.
+  std::unique_ptr<UseSetTy> Uses{std::make_unique<UseSetTy>()};
+  DefMapTy Defs{};
 
-  class GadgetFinderCallback : public MatchFinder::MatchCallback {
-    GadgetList &Output;
+public:
+  DeclUseTracker() = default;
+  DeclUseTracker(const DeclUseTracker &) = delete; // Let's avoid copies.
+  DeclUseTracker(DeclUseTracker &&) = default;
+
+  // Start tracking a freshly discovered DRE.
+  void discoverUse(const DeclRefExpr *DRE) { Uses->insert(DRE); }
+
+  // Stop tracking the DRE as it's been fully figured out.
+  void claimUse(const DeclRefExpr *DRE) {
+    assert(Uses->count(DRE) &&
+           "DRE not found or claimed by multiple matchers!");
+    Uses->erase(DRE);
+  }
+
+  // A variable is unclaimed if at least one use is unclaimed.
+  bool hasUnclaimedUses(const VarDecl *VD) const {
+    // FIXME: Can this be less linear? Maybe maintain a map from VDs to DREs?
+    return any_of(*Uses, [VD](const DeclRefExpr *DRE) {
+      return DRE->getDecl()->getCanonicalDecl() == VD->getCanonicalDecl();
+    });
+  }
+
+  void discoverDecl(const DeclStmt *DS) {
+    for (const Decl *D: DS->decls()) {
+      if (const auto *VD = dyn_cast<VarDecl>(D)) {
+        assert(Defs.count(VD) == 0 && "Definition already discovered!");
+        Defs[VD] = DS;
+      }
+    }
+  }
 
-  public:
-    GadgetFinderCallback(GadgetList &Output) : Output(Output) {}
+  const DeclStmt *lookupDecl(const VarDecl *VD) const {
+    auto It = Defs.find(VD);
+    assert(It != Defs.end() && "Definition never discovered!");
+    return It->second;
+  }
+};
+} // namespace
+
+namespace {
+// Strategy is a map from variables to the way we plan to emit fixes for
+// these variables. It is figured out gradually by trying different fixes
+// for different variables depending on gadgets in which these variables
+// participate.
+class Strategy {
+public:
+  enum class Kind {
+    Wontfix,    // We don't plan to emit a fixit for this variable.
+    Span,       // We recommend replacing the variable with std::span.
+    Iterator,   // We recommend replacing the variable with std::span::iterator.
+    Array,      // We recommend replacing the variable with std::array.
+    Vector      // We recommend replacing the variable with std::vector.
+  };
+
+private:
+  using MapTy = llvm::DenseMap<const VarDecl *, Kind>;
+
+  MapTy Map;
+
+public:
+  Strategy() = default;
+  Strategy(const Strategy &) = delete; // Let's avoid copies.
+  Strategy(Strategy &&) = default;
+
+  void set(const VarDecl *VD, Kind K) {
+    Map[VD] = K;
+  }
+
+  Kind lookup(const VarDecl *VD) const {
+    auto I = Map.find(VD);
+    if (I == Map.end())
+      return Kind::Wontfix;
+
+    return I->second;
+  }
+};
+} // namespace
+
+/// Scan the function and return a list of gadgets found with provided kits.
+static std::pair<GadgetList, DeclUseTracker> findGadgets(const Decl *D) {
+
+  struct GadgetFinderCallback : MatchFinder::MatchCallback {
+    GadgetList Gadgets;
+    DeclUseTracker Tracker;
 
     void run(const MatchFinder::MatchResult &Result) override {
+      if (const auto *DRE = Result.Nodes.getNodeAs<DeclRefExpr>("any_dre")) {
+        Tracker.discoverUse(DRE);
+      }
+
+      if (const auto *DS = Result.Nodes.getNodeAs<DeclStmt>("any_ds")) {
+        Tracker.discoverDecl(DS);
+      }
+
       // Figure out which matcher we've found, and call the appropriate
       // subclass constructor.
       // FIXME: Can we do this more logarithmically?
 #define GADGET(x)                                                              \
       if (Result.Nodes.getNodeAs<Stmt>(#x)) {                                  \
-        Output.push_back(std::make_unique<x ## Gadget>(Result));               \
+        Gadgets.push_back(std::make_unique<x ## Gadget>(Result));              \
         return;                                                                \
       }
 #include "clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def"
@@ -199,9 +347,8 @@
     }
   };
 
-  GadgetList G;
   MatchFinder M;
-  GadgetFinderCallback CB(G);
+  GadgetFinderCallback CB;
 
   // clang-format off
   M.addMatcher(
@@ -212,8 +359,13 @@
         x ## Gadget::matcher().bind(#x),
 #include "clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def"
 #undef GADGET
-        // FIXME: Is there a better way to avoid hanging comma?
-        unless(stmt())
+        // In parallel, match all DeclRefExprs so that to find out
+        // whether there are any uncovered by gadgets.
+        declRefExpr(hasPointerType(), to(varDecl())).bind("any_dre"),
+        // Also match DeclStmts because we'll need them when fixing
+        // their underlying VarDecls that otherwise don't have
+        // any backreferences to DeclStmts.
+        declStmt().bind("any_ds")
       ))
       // FIXME: Idiomatically there should be a forCallable(equalsNode(D))
       // here, to make sure that the statement actually belongs to the
@@ -228,15 +380,99 @@
 
   M.match(*D->getBody(), D->getASTContext());
 
-  return G; // NRVO!
+  // Gadgets "claim" variables they're responsible for. Once this loop finishes,
+  // the tracker will only track DREs that weren't claimed by any gadgets,
+  // i.e. not understood by the analysis.
+  for (const auto &G : CB.Gadgets) {
+    for (const auto *DRE : G->getClaimedVarUseSites()) {
+      CB.Tracker.claimUse(DRE);
+    }
+  }
+
+  return {std::move(CB.Gadgets), std::move(CB.Tracker)};
 }
 
 void clang::checkUnsafeBufferUsage(const Decl *D,
                                    UnsafeBufferUsageHandler &Handler) {
   assert(D && D->getBody());
 
-  GadgetList Gadgets = findGadgets(D);
+  SmallSet<const VarDecl *, 8> WarnedDecls;
+
+  auto [Gadgets, Tracker] = findGadgets(D);
+
+  DenseMap<const VarDecl *, std::vector<const Gadget *>> Map;
+
+  // First, let's sort gadgets by variables. If some gadgets cover more than one
+  // variable, they'll appear more than once in the map.
   for (const auto &G : Gadgets) {
-    Handler.handleUnsafeOperation(G->getBaseStmt());
+    DeclUseList DREs = G->getClaimedVarUseSites();
+
+    // Populate the map.
+    bool Pushed = false;
+    for (const DeclRefExpr *DRE : DREs) {
+      if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+        Map[VD].push_back(G.get());
+        Pushed = true;
+      }
+    }
+
+    if (!Pushed && !G->isSafe()) {
+      // We won't return to this gadget later. Emit the warning right away.
+      Handler.handleUnsafeOperation(G->getBaseStmt());
+      continue;
+    }
+  }
+
+  Strategy S;
+
+  for (const auto &Item : Map) {
+    const VarDecl *VD = Item.first;
+    const std::vector<const Gadget *> &VDGadgets = Item.second;
+
+    // If the variable has no unsafe gadgets, skip it entirely.
+    if (!any_of(VDGadgets, [](const Gadget *G) { return !G->isSafe(); }))
+      continue;
+
+    Optional<FixItList> Fixes = None;
+
+    // Avoid suggesting fixes if not all uses of the variable are identified
+    // as known gadgets.
+    // FIXME: Support parameter variables as well.
+    if (!Tracker.hasUnclaimedUses(VD) && VD->isLocalVarDecl()) {
+      // Choose the appropriate strategy. FIXME: We should try different
+      // strategies.
+      S.set(VD, Strategy::Kind::Span);
+
+      // Check if it works.
+      // FIXME: This isn't sufficient (or even correct) when a gadget has
+      // already produced a fixit for a different variable i.e. it was mentioned
+      // in the map twice (or more). In such case the correct thing to do is
+      // to undo the previous fix first, and then if we can't produce the new
+      // fix for both variables, revert to the old one.
+      Fixes = FixItList{};
+      for (const Gadget *G : VDGadgets) {
+        Optional<FixItList> F = G->getFixits(S);
+        if (!F) {
+          Fixes = None;
+          break;
+        }
+
+        for (auto &&Fixit: *F)
+          Fixes->push_back(std::move(Fixit));
+      }
+    }
+
+    if (Fixes) {
+      // If we reach this point, the strategy is applicable.
+      Handler.handleFixableVariable(VD, std::move(*Fixes));
+    } else {
+      // The strategy has failed. Emit the warning without the fixit.
+      S.set(VD, Strategy::Kind::Wontfix);
+      for (const Gadget *G : VDGadgets) {
+        if (!G->isSafe()) {
+          Handler.handleUnsafeOperation(G->getBaseStmt());
+        }
+      }
+    }
   }
 }
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===================================================================
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -11651,6 +11651,8 @@
   "casting from randomized structure pointer type %0 to %1">;
 
 // Unsafe buffer usage diagnostics.
-def warn_unsafe_buffer_usage : Warning<"unchecked operation on raw buffer in expression">,
-  InGroup<DiagGroup<"unsafe-buffer-usage">>, DefaultIgnore;
+def warn_unsafe_buffer_expression : Warning<"unchecked operation on raw buffer in expression">,
+  InGroup<UnsafeBufferUsage>, DefaultIgnore;
+def warn_unsafe_buffer_variable : Warning<"variable %0 participates in unchecked buffer operations">,
+  InGroup<UnsafeBufferUsage>, DefaultIgnore;
 } // end of sema component.
Index: clang/include/clang/Basic/DiagnosticGroups.td
===================================================================
--- clang/include/clang/Basic/DiagnosticGroups.td
+++ clang/include/clang/Basic/DiagnosticGroups.td
@@ -1380,3 +1380,5 @@
 // HLSL diagnostic groups
 // Warnings for HLSL Clang extensions
 def HLSLExtension : DiagGroup<"hlsl-extensions">;
+
+def UnsafeBufferUsage : DiagGroup<"unsafe-buffer-usage">;
Index: clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
===================================================================
--- clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
+++ clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
@@ -25,8 +25,16 @@
   UnsafeBufferUsageHandler() = default;
   virtual ~UnsafeBufferUsageHandler() = default;
 
+  /// This analyses produces large fixits that are organized into lists
+  /// of primitive fixits (individual insertions/removals/replacements).
+  using FixItList = llvm::SmallVector<FixItHint, 2>;
+
   /// Invoked when an unsafe operation over raw pointers is found.
   virtual void handleUnsafeOperation(const Stmt *Operation) = 0;
+
+  /// Invoked when a fix is suggested against a variable.
+  virtual void handleFixableVariable(const VarDecl *Variable,
+                                     FixItList &&List) = 0;
 };
 
 // This function invokes the analysis and allows the caller to react to it
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to