Szelethus created this revision.
Szelethus added reviewers: NoQ, xazax.hun, martong, whisperity, rnkovacs,
dcoughlin, baloghadamsoftware.
Szelethus added a project: clang.
Herald added subscribers: cfe-commits, DenisDvlp, steakhal, Charusso,
gamesh411, dkrupp, donat.nagy, mikhail.ramalho, a.sidorin, szepet, mgorny.
The following revision adds the basic infrastructure for a reaching definitions
algorithm for C++.
Short description & motivation
==============================
This is a dataflow algorithm designed to find a set of definitions (for example
assignments) to variables in a given `CFGBlock`. To demonstrate, in the
following code snippet:
int flag;
void foo();
void f() {
int *x = nullptr;
flag = 1;
foo();
if (flag)
x = new int;
foo();
if (flag)
*x = 5;
}
The CFG would look like this:
-> [B3] -> -> [B1] ->
/ \ / \
[B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)]
Should `foo()` change `flag`'s value on the first function call to `false`,
then `true` on the second, a null pointer dereference error would occur. Using
a reaching definitions calculator, we can retrieve that the set of ingoing
definitions to B1 <https://reviews.llvm.org/B1> is `{(flag, [B2]), (x, [B3]),
(x, [B4])}`. The set hints that `x` has a reaching definitions that would not
have caused a nullpointer dereference.
The algorithm
=============
A reaching definition for a given instruction is an earlier instruction whose
target variable can reach (be assigned to) the given one without an intervening
assignment. The similarly named reaching definitions is a data-flow analysis
which statically determines which definitions may reach a given point in the
code [1].
The set of ingoing and outgoing reaching definitions are calculated for each
basic block with set operations on GEN and KILL sets. This could be further
fine-grained so the RD sets would be a property of a statement, rather then an
entire `CFGBlock`, but I didn't want to delay the publishing of this patch any
further, and I believe that such a change wouldn't hurt the infrastructure much.
Implementation
==============
As the formal definition would suggest, this was conceived for instructions,
which is why even such terms as "variable" or "definition" can be hard to
define for C++. For this reason, the patch spares a lot of LOC for
documentation and reasoning. While the algorithm itself is simple (and is
implemented quite literally from [1]), the problematic part of this is the
generation of of GEN and KILL sets. I tried to introduce an infrastructure that
can tolerate a lot of things I have inevitable forgotten (or left for followup
patches) with the use of easy to add AST matchers.
Immediate questions to address
==============================
> We already have 2 dataflow algorithms implemented in the Analysis library:
> UninitializedObject and LiveVariables. Reaching definitions doesn't sound all
> too dissimilar. Why are we adding hundreds of LOC here? Can't we reuse some
> of it?
UninitializedObject and LiveVariables are practically the same algorithm with
minimal differences. Despite this, they take up ~750 and ~1050 LOC
respectively. Both of those algorithms can be expressed with the use of GEN and
KILL sets, yet none of them are, and they duplicate a lot of logic. It isn't
terribly obvious however how their logic could be (or really, should be) merged.
Shockingly, we have no GEN and KILL set implementations in Clang. I think that
is the main addition of this patch, even if it unfortunately duplicates some
logic. The previously mentioned two analyses however could serve as an
inspiration.
> UninitializedObject and LiveVariables uses ASTVisitors rather than
> ASTMatchers. The latter is more expensive. Why did we go with them?
Matchers are more expressive. For instance, how would you find `s.a.x.z` with
visitors? Its doable, but requires to keep track of a lot of state and would
make the implementation ugly. I don't have complete confidence in my decision
here, so I welcome alternative suggestions or counterarguments.
> What are the main things to get done?
In order:
- Finalize the infrastructure for GEN set generation.
- Make it possible to calculate RD sets for statements, not only blocks.
- Improve the interface of `ReachingDefinitionsCalculator`. What we like to
query? Most probably the set of ingoing RDs for a variable at a given statement.
- Performance: use immutable data structures and a better CFG traversal
strategy.
Further reading
===============
My GSoC project: https://szelethus.github.io/gsoc2019/ (this has a lot of
pointers to related discussions in the 'Reaching Definitions Analysis' section)
[cfe-dev] Dataflow analyses in Clang -- why do we not have GEN and KILL
sets? http://lists.llvm.org/pipermail/cfe-dev/2020-March/064893.html
[analyzer][WIP] Implement a primitive reaching definitions analysis D64991
<https://reviews.llvm.org/D64991>
References
==========
My main source was wikipedia:
[1] https://en.wikipedia.org/wiki/Reaching_definition
I read the following articles, but they didn't give me the information I needed:
Tonella, Paolo, et al. "Variable precision reaching definitions analysis for
software maintenance." Proceedings. First Euromicro Conference on Software
Maintenance and Reengineering. IEEE, 1997.
Collard, Jean-François, and Jens Knoop. "A comparative study of
reaching-definitions analyses." (1998).
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D76287
Files:
clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
clang/lib/Analysis/CMakeLists.txt
clang/lib/Analysis/ReachingDefinitions.cpp
clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
clang/test/Analysis/dump-definitions.cpp
Index: clang/test/Analysis/dump-definitions.cpp
===================================================================
--- /dev/null
+++ clang/test/Analysis/dump-definitions.cpp
@@ -0,0 +1,153 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN: -analyzer-checker=debug.DumpCFG \
+// RUN: -analyzer-checker=debug.DumpGenSets \
+// RUN: -analyzer-checker=debug.DumpKillSets \
+// RUN: -analyzer-checker=debug.DumpReachingDefinitions \
+// RUN: 2>&1 | FileCheck %s
+
+int getInt();
+int *getIntPtr();
+bool coin();
+
+void single_vardecl_in_declstmt() {
+ int *ptr = getIntPtr();
+}
+// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]
+
+// CHECK: GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>
+
+void multiple_vardecl_in_declstmt() {
+ int *ptr = getIntPtr(), i;
+}
+// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]
+
+// CHECK: GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 (i, [1, 4]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 IN (i, [1, 4]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 OUT (i, [1, 4]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 OUT (i, [1, 4]) <write>
+
+void function_and_vardecl_in_declstmt() {
+ int *ptr = getIntPtr(), a();
+}
+// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]
+
+// CHECK: GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>
+
+void single_def_in_same_block() {
+ int *ptr = getIntPtr();
+
+ if (coin())
+ ptr = 0;
+
+ if (!ptr)
+ *ptr = 5;
+}
+// -> [B3] -> -> [B1] ->
+// / \ / \
+// [B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)]
+
+// CHECK: GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 3 (ptr, [3, 3]) <write>
+// CHECK-NEXT: 4 (ptr, [4, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 3 (ptr, [4, 3]) <write>
+// CHECK-NEXT: 4 (ptr, [3, 3]) <write>
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [3, 3]) <write>
+// CHECK-NEXT: 0 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [4, 3]) <write>
+// CHECK-NEXT: 1 IN (ptr, [3, 3]) <write>
+// CHECK-NEXT: 1 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [4, 3]) <write>
+// CHECK-NEXT: 2 IN (ptr, [3, 3]) <write>
+// CHECK-NEXT: 2 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 2 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 2 OUT (ptr, [4, 3]) <write>
+// CHECK-NEXT: 3 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 3 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 4 OUT (ptr, [4, 3]) <write>
+
+void different_assignments() {
+ int i = getInt();
+
+ if (coin())
+ i = 0;
+
+ i += 3;
+
+ if (!coin())
+ i -= 2;
+
+ i *= 9;
+
+ if (i = 0)
+ ;
+}
+// -> [B5] -> -> [B3] -> -> [B1] ->
+// / \ / \ / \
+// [B7 (ENTRY)] -> [B6] ------> [B4] -------> [B2] ---> [B0 (EXIT)]
+
+// CHECK: GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 2 (i, [2, 5]) <write>
+// CHECK-NEXT: 3 (i, [3, 2]) <write>
+// CHECK-NEXT: 4 (i, [4, 2]) <write>
+// CHECK-NEXT: 5 (i, [5, 2]) <write>
+// CHECK-NEXT: 6 (i, [6, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 2 (i, [3, 2]) <write>
+// CHECK-NEXT: 2 (i, [4, 2]) <write>
+// CHECK-NEXT: 2 (i, [5, 2]) <write>
+// CHECK-NEXT: 2 (i, [6, 3]) <write>
+// CHECK-NEXT: 3 (i, [2, 5]) <write>
+// CHECK-NEXT: 3 (i, [4, 2]) <write>
+// CHECK-NEXT: 3 (i, [5, 2]) <write>
+// CHECK-NEXT: 3 (i, [6, 3]) <write>
+// CHECK-NEXT: 4 (i, [2, 5]) <write>
+// CHECK-NEXT: 4 (i, [3, 2]) <write>
+// CHECK-NEXT: 4 (i, [5, 2]) <write>
+// CHECK-NEXT: 4 (i, [6, 3]) <write>
+// CHECK-NEXT: 5 (i, [2, 5]) <write>
+// CHECK-NEXT: 5 (i, [3, 2]) <write>
+// CHECK-NEXT: 5 (i, [4, 2]) <write>
+// CHECK-NEXT: 5 (i, [6, 3]) <write>
+// CHECK-NEXT: 6 (i, [2, 5]) <write>
+// CHECK-NEXT: 6 (i, [3, 2]) <write>
+// CHECK-NEXT: 6 (i, [4, 2]) <write>
+// CHECK-NEXT: 6 (i, [5, 2]) <write>
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (i, [2, 5]) <write>
+// CHECK-NEXT: 0 OUT (i, [2, 5]) <write>
+// CHECK-NEXT: 1 IN (i, [2, 5]) <write>
+// CHECK-NEXT: 1 OUT (i, [2, 5]) <write>
+// CHECK-NEXT: 2 IN (i, [3, 2]) <write>
+// CHECK-NEXT: 2 IN (i, [4, 2]) <write>
+// CHECK-NEXT: 2 OUT (i, [2, 5]) <write>
+// CHECK-NEXT: 3 IN (i, [4, 2]) <write>
+// CHECK-NEXT: 3 OUT (i, [3, 2]) <write>
+// CHECK-NEXT: 4 IN (i, [5, 2]) <write>
+// CHECK-NEXT: 4 IN (i, [6, 3]) <write>
+// CHECK-NEXT: 4 OUT (i, [4, 2]) <write>
+// CHECK-NEXT: 5 IN (i, [6, 3]) <write>
+// CHECK-NEXT: 5 OUT (i, [5, 2]) <write>
+// CHECK-NEXT: 6 OUT (i, [6, 3]) <write>
Index: clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
+++ clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
@@ -10,12 +10,13 @@
//
//===----------------------------------------------------------------------===//
-#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/Analysis/Analyses/Dominators.h"
#include "clang/Analysis/Analyses/LiveVariables.h"
+#include "clang/Analysis/Analyses/ReachingDefinitions.h"
#include "clang/Analysis/CallGraph.h"
-#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
@@ -102,6 +103,72 @@
return true;
}
+//===----------------------------------------------------------------------===//
+// GenSetDumper
+//===----------------------------------------------------------------------===//
+
+namespace {
+class GenSetDumper : public Checker<check::ASTCodeBody> {
+public:
+ void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
+ BugReporter &BR) const {
+ if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
+ AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpGenSet();
+ }
+};
+} // namespace
+
+void ento::registerGenSetDumper(CheckerManager &mgr) {
+ mgr.registerChecker<GenSetDumper>();
+}
+
+bool ento::shouldRegisterGenSetDumper(const LangOptions &LO) { return true; }
+
+//===----------------------------------------------------------------------===//
+// KillSetDumper
+//===----------------------------------------------------------------------===//
+
+namespace {
+class KillSetDumper : public Checker<check::ASTCodeBody> {
+public:
+ void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
+ BugReporter &BR) const {
+ if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
+ AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpKillSet();
+ }
+};
+} // namespace
+
+void ento::registerKillSetDumper(CheckerManager &mgr) {
+ mgr.registerChecker<KillSetDumper>();
+}
+
+bool ento::shouldRegisterKillSetDumper(const LangOptions &LO) { return true; }
+
+//===----------------------------------------------------------------------===//
+// ReachingDefinitionsDumper
+//===----------------------------------------------------------------------===//
+
+namespace {
+class ReachingDefinitionsDumper : public Checker<check::ASTCodeBody> {
+public:
+ void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
+ BugReporter &BR) const {
+ if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
+ AC->getAnalysis<ReachingDefinitionsCalculator>()
+ ->dumpReachingDefinitions();
+ }
+};
+} // namespace
+
+void ento::registerReachingDefinitionsDumper(CheckerManager &mgr) {
+ mgr.registerChecker<ReachingDefinitionsDumper>();
+}
+
+bool ento::shouldRegisterReachingDefinitionsDumper(const LangOptions &LO) {
+ return true;
+}
+
//===----------------------------------------------------------------------===//
// LiveVariablesDumper
//===----------------------------------------------------------------------===//
Index: clang/lib/Analysis/ReachingDefinitions.cpp
===================================================================
--- /dev/null
+++ clang/lib/Analysis/ReachingDefinitions.cpp
@@ -0,0 +1,319 @@
+//===--- ReachableDefinitions.cpp -------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Calculates reachable definitions for a variable.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Analyses/ReachingDefinitions.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/DeclCXX.h"
+#include "clang/AST/DeclLookups.h"
+#include "clang/AST/Expr.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/ASTMatchers/ASTMatchers.h"
+#include "clang/ASTMatchers/ASTMatchersInternal.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SetOperations.h"
+#include "llvm/Support/ErrorHandling.h"
+#include <memory>
+
+using namespace clang;
+using namespace ReachingDefinitionsDetail;
+using namespace ast_matchers;
+
+using DefinitionKind = Definition::DefinitionKind;
+
+//===----------------------------------------------------------------------===//
+// Utility functions.
+//===----------------------------------------------------------------------===//
+
+static bool killsVar(const Definition &Victim, const GenSet &Set) {
+ for (const Definition &Perpetrator : Set)
+ if (Victim.Var == Perpetrator.Var)
+ return true;
+ return false;
+}
+
+static StringRef describeDefinitionKind(DefinitionKind K) {
+ switch (K) {
+ case Definition::Write:
+ return "write";
+ }
+}
+
+//===----------------------------------------------------------------------===//
+// Methods of Definition.
+//===----------------------------------------------------------------------===//
+
+void Definition::dump() const {
+ llvm::errs() << "(" << Var->getNameAsString();
+
+ llvm::errs() << ", [" << getCFGBlock()->getIndexInCFG() << ", "
+ << E.getIndexInBlock() << "])"
+ << " <" << (describeDefinitionKind(Kind)) << ">";
+}
+
+//===----------------------------------------------------------------------===//
+// Matcher callbacks for constructing GEN sets for the variable finding stage.
+//===----------------------------------------------------------------------===//
+
+class VariableCollectorCB : public GenSetMatcherCallback {
+public:
+ VariableCollectorCB(GenSetBuilder &GSBuilder)
+ : GenSetMatcherCallback(GSBuilder) {}
+
+ static internal::Matcher<Stmt> getMatcher() {
+ return stmt(forEachDescendant(declRefExpr().bind("var")));
+ }
+
+ virtual void run(const MatchFinder::MatchResult &Result) override {
+ const auto *E = Result.Nodes.getNodeAs<DeclRefExpr>("var");
+ assert(E);
+ if (const VarDecl *Var = dyn_cast<VarDecl>(E->getDecl()))
+ GSBuilder.addVariable(Var);
+ }
+};
+
+//===----------------------------------------------------------------------===//
+// Matcher callbacks for constructing GEN sets for the definition finding stage.
+//===----------------------------------------------------------------------===//
+
+class AssignmentOperatorCB : public GenSetMatcherCallback {
+public:
+ AssignmentOperatorCB(GenSetBuilder &GSBuilder)
+ : GenSetMatcherCallback(GSBuilder) {}
+
+ static internal::Matcher<Stmt> getMatcher() {
+ return binaryOperator(isAssignmentOperator()).bind("assign");
+ }
+
+ virtual void run(const MatchFinder::MatchResult &Result) override {
+ const auto *BO = Result.Nodes.getNodeAs<BinaryOperator>("assign");
+ assert(BO);
+ GSBuilder.handleExpr(BO->getLHS(), DefinitionKind::Write);
+ }
+};
+
+class DeclarationCB : public GenSetMatcherCallback {
+public:
+ DeclarationCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {}
+
+ static internal::Matcher<Stmt> getMatcher() {
+ return declStmt().bind("decls");
+ }
+
+ virtual void run(const MatchFinder::MatchResult &Result) override {
+ const auto *DS = Result.Nodes.getNodeAs<DeclStmt>("decls");
+ assert(DS);
+
+ for (const Decl *D : DS->decls()) {
+ const auto *Var = dyn_cast<VarDecl>(D);
+ if (!Var)
+ continue;
+
+ GSBuilder.insertToGenSet(Var, DefinitionKind::Write);
+ }
+ }
+};
+
+//===----------------------------------------------------------------------===//
+// Matcher callbacks for constructing GEN sets for the expression finding stage.
+//===----------------------------------------------------------------------===//
+
+class DeclRefExprCB : public GenSetMatcherCallback {
+public:
+ DeclRefExprCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {}
+
+ static internal::Matcher<Stmt> getMatcher() {
+ return declRefExpr(to(varDecl().bind("var")));
+ }
+
+ virtual void run(const MatchFinder::MatchResult &Result) override {
+ const auto *Var = Result.Nodes.getNodeAs<VarDecl>("var");
+ assert(Var);
+ GSBuilder.insertToGenSet(Var);
+ }
+};
+
+//===----------------------------------------------------------------------===//
+// Methods of GenSetBuilder.
+//===----------------------------------------------------------------------===//
+
+GenSetBuilder::GenSetBuilder(const Decl *D)
+ : D(D), Context(&D->getASTContext()) {
+
+ // TODO: Should we match the entire TU for nested static variables?
+ addMatcher<&GenSetBuilder::VariableFinder, VariableCollectorCB>();
+
+ // TODO: Elvis operator (?:).
+ // TODO: C++ MemberExpr?
+ // TODO: ParenExpr? (((a))) = 3;
+ addMatcher<&GenSetBuilder::ExpressionFinder, DeclRefExprCB>();
+
+ // TODO: Destructor calls? Should we be THAT conservative?
+ // TODO: Regular function calls?
+ // TODO: Moving an object?
+ // TODO: Method calls?
+ // TODO: Analyzing a method?
+ // TODO: What do you do with Objective.*?
+ // TODO: Exceptions?
+ addMatcher<&GenSetBuilder::DefinitionFinder, AssignmentOperatorCB>();
+ addMatcher<&GenSetBuilder::DefinitionFinder, DeclarationCB>();
+
+ // Collect all used variables.
+ if (const auto *FD = dyn_cast<FunctionDecl>(D)) {
+ VariableFinder.match(*FD, FD->getASTContext());
+ for (const ParmVarDecl *Param : FD->parameters())
+ addVariable(Param);
+ }
+
+ // TODO: Collect all visible, non-local variables as well.
+}
+
+void GenSetBuilder::getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen) {
+ if (B->empty())
+ return;
+
+ VariableFinder.match(*D, D->getASTContext());
+
+ CurrentGenSet = &Gen;
+
+ for (CFGBlock::ConstCFGElementRef E : B->rrefs()) {
+ if (E->getKind() != CFGElement::Kind::Statement)
+ continue;
+ CurrentCFGElem = E;
+
+ const Stmt *S = E->castAs<CFGStmt>().getStmt();
+ assert(S);
+ DefinitionFinder.match(*S, D->getASTContext());
+ }
+
+ CurrentGenSet = nullptr;
+}
+
+void GenSetBuilder::handleExpr(const Expr *E, DefinitionKind Kind) {
+ CurrentKind = Kind;
+ ExpressionFinder.match(*E, *Context);
+}
+
+void GenSetBuilder::insertToGenSet(const VarDecl *Var,
+ DefinitionKind Kind) {
+ CurrentGenSet->emplace(Var, *CurrentCFGElem, Kind);
+}
+
+void GenSetBuilder::addVariable(const VarDecl *Var) {
+ AllVariables.emplace(Var);
+}
+
+//===----------------------------------------------------------------------===//
+// Methods of ReachingDefinitionsCalculator.
+//===----------------------------------------------------------------------===//
+
+ReachingDefinitionsCalculator::ReachingDefinitionsCalculator(const Decl *D,
+ const CFG *cfg)
+ : cfg(cfg), GSBuilder(D) {
+
+ for (const CFGBlock *B : *cfg)
+ GSBuilder.getGenSetForCFGBlock(B, Gen[B]);
+ // TODO: We ignore parameters! We should collect them as definitions as well.
+
+ calculate();
+}
+
+void ReachingDefinitionsCalculator::init() {
+ llvm::SmallVector<Definition, 16> AllDefinitions;
+ for (const std::pair<const CFGBlock *, GenSet> G : Gen)
+ for (const Definition &Def : G.second)
+ AllDefinitions.push_back(Def);
+
+ for (const std::pair<const CFGBlock *, GenSet> G : Gen)
+ for (const Definition &Def : AllDefinitions)
+ if (G.first != Def.getCFGBlock() && killsVar(Def, G.second))
+ Kill[G.first].insert(Def);
+}
+
+using WorklistTy = llvm::SmallVector<const CFGBlock *, 5>;
+
+void ReachingDefinitionsCalculator::calculate() {
+ init();
+
+ // TODO: Use ForwardDataflowWorklist.
+ for (const std::pair<const CFGBlock *, GenSet> G : Gen)
+ Out[G.first] = {G.second.begin(), G.second.end()};
+
+ WorklistTy Worklist({cfg->begin(), cfg->end()});
+
+ while (!Worklist.empty()) {
+ const CFGBlock *N = Worklist.pop_back_val();
+
+ for (const CFGBlock *Pred : N->preds())
+ llvm::set_union(In[N], Out[Pred]);
+
+ bool HasChanged =
+ llvm::set_union(Out[N], llvm::set_difference(In[N], Kill[N]));
+
+ if (llvm::set_union(Out[N], Gen[N]))
+ HasChanged = true;
+
+ if (HasChanged) {
+ for (const CFGBlock *Succ : N->succs())
+ Worklist.push_back(Succ);
+ }
+ }
+}
+
+void ReachingDefinitionsCalculator::dumpGenSet() const {
+ llvm::errs() << "GEN sets: blockid (varname [blockid, elementid])\n";
+ for (const std::pair<const CFGBlock *, GenSet> D : Gen) {
+ size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin();
+ for (const Definition &Def : D.second) {
+ llvm::errs() << BlockID << ' ';
+ Def.dump();
+ llvm::errs() << '\n';
+ }
+ }
+}
+
+void ReachingDefinitionsCalculator::dumpKillSet() const {
+ llvm::errs() << "KILL sets: blockid (varname [blockid, elementid])\n";
+ for (const std::pair<const CFGBlock *, DefinitionSet> D : Kill) {
+ size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin();
+ for (const Definition &Def : D.second) {
+ llvm::errs() << BlockID << ' ';
+ Def.dump();
+ llvm::errs() << '\n';
+ }
+ }
+}
+
+void ReachingDefinitionsCalculator::dumpReachingDefinitions() const {
+ llvm::errs() << "Reaching definition sets: "
+ "blockid IN/OUT (varname [blockid, elementid])\n";
+ for (const CFGBlock *B : *cfg) {
+ size_t BlockID = llvm::find(*cfg, B) - cfg->begin();
+ if (In.count(B)) {
+ for (const Definition &Def : In.find(B)->second) {
+ llvm::errs() << BlockID << " IN ";
+ Def.dump();
+ llvm::errs() << '\n';
+ }
+ }
+
+ if (Out.count(B)) {
+ for (const Definition &Def : Out.find(B)->second) {
+ llvm::errs() << BlockID << " OUT ";
+ Def.dump();
+ llvm::errs() << '\n';
+ }
+ }
+ }
+}
Index: clang/lib/Analysis/CMakeLists.txt
===================================================================
--- clang/lib/Analysis/CMakeLists.txt
+++ clang/lib/Analysis/CMakeLists.txt
@@ -22,6 +22,7 @@
PostOrderCFGView.cpp
ProgramPoint.cpp
ReachableCode.cpp
+ ReachingDefinitions.cpp
RetainSummaryManager.cpp
ThreadSafety.cpp
ThreadSafetyCommon.cpp
Index: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
===================================================================
--- clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -1415,8 +1415,19 @@
Dependencies<[DebugContainerModeling, IteratorModeling]>,
Documentation<NotDocumented>;
-} // end "debug"
+def GenSetDumper : Checker<"DumpGenSets">,
+ HelpText<"Dump the GEN sets for each block in a function">,
+ Documentation<NotDocumented>;
+
+def KillSetDumper : Checker<"DumpKillSets">,
+ HelpText<"Dump the KILL sets for each block in a function">,
+ Documentation<NotDocumented>;
+def ReachingDefinitionsDumper : Checker<"DumpReachingDefinitions">,
+ HelpText<"Dump the reaching definitions for each block in a function">,
+ Documentation<NotDocumented>;
+
+} // end "debug"
//===----------------------------------------------------------------------===//
// Clone Detection
Index: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
===================================================================
--- /dev/null
+++ clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
@@ -0,0 +1,247 @@
+//===--- ReachingDefinitions.h ----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Calculates reaching definitions for a CFG.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
+
+#include "clang/AST/Decl.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "llvm/ADT/SmallVector.h"
+#include <set>
+
+namespace clang {
+
+//===----------------------------------------------------------------------===//
+// Since reaching definitions was implemented for instructions, it isn't well
+// thought out what a definition of a variable is in C/C++, nor what we really
+// mean under the term variable.
+//===----------------------------------------------------------------------===//
+
+namespace ReachingDefinitionsDetail {
+
+//===----------------------------------------------------------------------===//
+// What a 'variable' is:
+// We define a variable as a VarDecl.
+// TODO: How about records? Add support for fields.
+// TODO: How do we describe the implicit this parameter for methods?
+//===----------------------------------------------------------------------===//
+
+struct Variable {
+ const VarDecl *Var;
+
+ Variable(const VarDecl *Var) : Var(Var) {}
+};
+
+struct VariableLess {
+ // TODO: const ref? but what if we change this to immutablelist?
+ bool operator()(Variable Lhs, Variable Rhs) {
+ return Lhs.Var < Rhs.Var;
+ }
+};
+
+//===----------------------------------------------------------------------===//
+// What a 'definition of a variable' is:
+// Whatever statement that writes a 'variable', or may write a 'variable'.
+// TODO: What about global variables on function calls? We should introduce
+// invalidations as well.
+//===----------------------------------------------------------------------===//
+
+class Definition : public Variable {
+public:
+ enum DefinitionKind { Write };
+
+ CFGBlock::ConstCFGElementRef E;
+ DefinitionKind Kind;
+
+public:
+
+ Definition(const VarDecl *Var, CFGBlock::ConstCFGElementRef E,
+ DefinitionKind Kind)
+ : Variable(Var), E(E), Kind(Kind) {}
+
+ const CFGBlock *getCFGBlock() const { return E.getParent(); }
+
+ // (varname [blockid, elementid]) (reason)
+ void dump() const;
+};
+
+struct VarAndCFGElementLess {
+ bool operator()(Definition Lhs, Definition Rhs) const {
+ return std::tie(Lhs.Var, Lhs.E) <
+ std::tie(Rhs.Var, Rhs.E);
+ }
+};
+
+/// A set of definitions sorted only by the variable, so that each basic block
+/// may only emit a single definition for any single variable.
+// TODO: We need to track more then a single definition to a variable for a
+// block's GEN set. Say, the static analyzer manages to prove that a potential
+// invalidation definition (like a function call) didn't write the variable, we
+// need to retrieve the definitions up to that point in the block.
+// TODO: This is ridiculously espensive, change to ImmutableSet.
+using GenSet = std::set<Definition, VariableLess>;
+
+/// A set of definitions sorted by the variable and the location of the
+/// definition. For KILL, IN and OUT sets this is correct, because a CFGBlock
+/// may kill several definitions of the same variables from different locations.
+using DefinitionSet = std::set<Definition, VarAndCFGElementLess>;
+
+//===----------------------------------------------------------------------===//
+// Determining whether a statement modifies a variable is a challanging, and
+// requires using expensive-to-create matchers, and an easily extensible
+// interface, hence the use of a builder class outside of the main calculator.
+//===----------------------------------------------------------------------===//
+
+class GenSetBuilder;
+
+class GenSetMatcherCallback : public ast_matchers::MatchFinder::MatchCallback {
+protected:
+ GenSetBuilder &GSBuilder;
+
+ GenSetMatcherCallback(GenSetBuilder &GSBuilder) : GSBuilder(GSBuilder) {}
+};
+
+/// Responsible for building the GEN sets for each basic block.
+///
+/// Since pointer escapes or function calls in general require us to generate
+/// definitions that are invalidations, we need to gather all variables relevant
+/// for this analysis, like parameters, locals and globals. We refer to this
+/// stage as 'variable finding':
+/// * Collect all non-local, visible variables
+/// * Collect all local variables within the function that is used
+///
+/// The actual building of GEN sets has two stages:
+/// 1. For each CFGStmt in the CFGBlock, look for a statement that may be a
+/// definition of an expression. ('definition finding')
+/// 2. Search expressions for variables recursively. ('expression finding')
+class GenSetBuilder {
+ using DefinitionKind = Definition::DefinitionKind;
+
+ // Fields that shouldn't change after the construction of the builder object.
+
+ llvm::SmallVector<std::unique_ptr<GenSetMatcherCallback>, 8> Callbacks;
+
+ ast_matchers::MatchFinder VariableFinder;
+ ast_matchers::MatchFinder DefinitionFinder;
+ ast_matchers::MatchFinder ExpressionFinder;
+
+ const Decl *D;
+ ASTContext *Context = nullptr;
+
+ // TODO: This does not contain non-local visible variables.
+ std::set<Variable, VariableLess> AllVariables;
+
+ // Fields that are changed at and during each GEN set construction.
+
+ GenSet *CurrentGenSet;
+ Optional<CFGBlock::ConstCFGElementRef> CurrentCFGElem;
+
+ // This field is used for definition finder matchers to communicate with
+ // expression finder matchers what kind of a definition we need to note. This
+ // isn't too pretty, but the interaction in between matcher callbacks are
+ // inherently awkward.
+ DefinitionKind CurrentKind = Definition::Write;
+
+private:
+ // TODO: Make this public and allow custom matchers to be added?
+ template <ast_matchers::MatchFinder GenSetBuilder::*Finder,
+ class GenSetMatcherCallbackTy>
+ void addMatcher() {
+ Callbacks.emplace_back(std::make_unique<GenSetMatcherCallbackTy>(*this));
+ (this->*Finder)
+ .addMatcher(GenSetMatcherCallbackTy::getMatcher(),
+ Callbacks.back().get());
+ }
+
+public:
+ GenSetBuilder(const Decl *D);
+
+ //===--------------------------------------------------------------------===//
+ // Methods for retrieving a GEN set.
+ //===--------------------------------------------------------------------===//
+
+ void getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen);
+
+ //===--------------------------------------------------------------------===//
+ // Utility methods for building a GEN set. These are public because the
+ // callback objects will need to call them.
+ //===--------------------------------------------------------------------===//
+
+ /// When we find a definition to an *expression*, we need to see if that
+ /// expression is a variable, or some other expression that needs further
+ /// processing, like in this case:
+ /// (a, b) = 10;
+ void handleExpr(const Expr *E, DefinitionKind Kind);
+
+ /// Insert a new defintion of a variable into the current GEN set.
+ void insertToGenSet(const VarDecl *Var, DefinitionKind Kind);
+
+ void insertToGenSet(const VarDecl *Var) { insertToGenSet(Var, CurrentKind); }
+
+ /// Called during the variable finding stage.
+ void addVariable(const VarDecl *Var);
+};
+
+} // end of namespace ReachingDefinitionsDetail
+
+/// Calculates reaching definitions for each basic block. This calculation
+/// doesn't try to argue about aliasing, meaning that some definitions are
+/// definite write (we know that the variable it written), and some are a result
+/// of invalidation, like passing a variable as a non-const reference to a
+/// function.
+class ReachingDefinitionsCalculator : public ManagedAnalysis {
+ using GenSetBuilder = ReachingDefinitionsDetail::GenSetBuilder;
+
+public:
+ using GenSet = ReachingDefinitionsDetail::GenSet;
+ using DefinitionKind = ReachingDefinitionsDetail::Definition::DefinitionKind;
+ using DefinitionSet = ReachingDefinitionsDetail::DefinitionSet;
+
+ void calculate();
+
+ void dumpKillSet() const;
+ void dumpGenSet() const;
+ void dumpReachingDefinitions() const;
+
+ static ReachingDefinitionsCalculator *create(AnalysisDeclContext &Ctx) {
+ return new ReachingDefinitionsCalculator(Ctx.getDecl(), Ctx.getCFG());
+ }
+
+ static const void *getTag() {
+ static int x;
+ return &x;
+ }
+
+private:
+ ReachingDefinitionsCalculator(const Decl *D, const CFG *cfg);
+
+ void init();
+
+ const CFG *cfg;
+ GenSetBuilder GSBuilder;
+
+ // TODO: Make the GEN set public to allow clients to remove definitions, or
+ // possibly mark an invalidation as write.
+ // TODO: Should we change to ImmutableMap? Does this matter that much?
+ std::map<const CFGBlock *, GenSet> Gen;
+ std::map<const CFGBlock *, DefinitionSet> Kill;
+
+public:
+ std::map<const CFGBlock *, DefinitionSet> In;
+ std::map<const CFGBlock *, DefinitionSet> Out;
+};
+
+} // end of namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits