[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -314,6 +329,193 @@ getFuchsiaHandleSymbols(QualType QT, SVal Arg, 
ProgramStateRef State) {
   return {};
 }
 
+FuchsiaHandleChecker::Note FuchsiaHandleChecker::createNote(
+SymbolRef Sym,
+std::function Message) const {
+  return [Sym, Message](BugReport &BR) -> std::string {
+auto *PathBR = static_cast(&BR);
+if (!PathBR->getInterestingnessKind(Sym))
+  return "";
+std::string SBuf;
+llvm::raw_string_ostream OS(SBuf);
+Message(OS);
+return SBuf;
+  };
+}
+
+bool FuchsiaHandleChecker::needsEval(const CallEvent &Call) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (hasFuchsiaAttr(FuncDecl) ||
+  hasFuchsiaUnownedAttr(FuncDecl))
+return true;
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+
+if (hasFuchsiaAttr(PVD) ||
+hasFuchsiaAttr(PVD) ||
+hasFuchsiaUnownedAttr(PVD))
+  return true;
+  }
+
+  return false;
+}
+
+ProgramStateRef FuchsiaHandleChecker::evalArgsAttrs(const CallEvent &Call,
+CheckerContext &C,
+ProgramStateRef State,
+NotesVec &Notes) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+  SymbolRef ResultSymbol = nullptr;
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (const auto *TypeDefTy = FuncDecl->getReturnType()->getAs())
+if (TypeDefTy->getDecl()->getName() == ErrorTypeName) {
+  ResultSymbol =
+  State->getSVal(Call.getOriginExpr(), C.getLocationContext())
+  .getAsSymbol();
+  assert(ResultSymbol && "Result symbol should be there");
+}
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+unsigned ParamDiagIdx = PVD->getFunctionScopeIndex() + 1;
+SmallVector Handles =
+getFuchsiaHandleSymbols(PVD->getType(), Call.getArgSVal(Arg), State);
+
+for (SymbolRef Handle : Handles) {
+  if (hasFuchsiaAttr(PVD)) {
+Notes.push_back(
+createNote(Handle, [ParamDiagIdx](llvm::raw_string_ostream &OS) {
+  OS << "Handle released through " << ParamDiagIdx
+ << llvm::getOrdinalSuffix(ParamDiagIdx) << " parameter";
+}));

NagyDonat wrote:

Instead of using lambdas and ostream write operations, this could be 
implemented even more elegantly with `llvm::formatv`.

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

> I just saw it in various easy checkers like `ValistChecker`, so I thought 
> it's "standard" way of reporting.

Yes, ValistChecker is another example where it's useless. By the way that's the 
first checker that I wrote (not completely alone) when I was an intern many 
years ago -- maybe when I'll have some free time I'll get rid of the visitor 
there.

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -314,6 +449,127 @@ getFuchsiaHandleSymbols(QualType QT, SVal Arg, 
ProgramStateRef State) {
   return {};
 }
 
+bool FuchsiaHandleChecker::needsInvalidate(const CallEvent &Call) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+
+if (PVD->getType()->isAnyPointerType() &&
+(hasFuchsiaAttr(PVD) ||
+ hasFuchsiaAttr(PVD) ||
+ hasFuchsiaUnownedAttr(PVD))) {
+  return true;
+}
+  }
+
+  return false;
+}
+
+ProgramStateRef
+FuchsiaHandleChecker::evalArgsAttrs(const CallEvent &Call, CheckerContext &C,
+ProgramStateRef State) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+  SymbolRef ResultSymbol = nullptr;
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (const auto *TypeDefTy = FuncDecl->getReturnType()->getAs())
+if (TypeDefTy->getDecl()->getName() == ErrorTypeName) {
+  SValBuilder &SVB = C.getSValBuilder();
+  const LocationContext *LC = C.getLocationContext();
+  auto RetVal = SVB.conjureSymbolVal(
+  Call.getOriginExpr(), LC, FuncDecl->getReturnType(), C.blockCount());
+
+  State = State->BindExpr(Call.getOriginExpr(), LC, RetVal);
+  ResultSymbol = RetVal.getAsSymbol();
+}
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+SmallVector Handles =
+getFuchsiaHandleSymbols(PVD->getType(), Call.getArgSVal(Arg), State);
+
+for (SymbolRef Handle : Handles) {
+  if (hasFuchsiaAttr(PVD)) {
+State =
+State->set(Handle, HandleState::getReleased(Arg + 1));
+  } else if (hasFuchsiaAttr(PVD)) {
+State = State->set(
+Handle, HandleState::getMaybeAllocated(ResultSymbol, Arg + 1));
+  } else if (hasFuchsiaUnownedAttr(PVD)) {
+State = State->set(Handle, HandleState::getUnowned(Arg + 
1));
+  }
+}
+  }
+
+  return State;
+}
+
+ProgramStateRef FuchsiaHandleChecker::evalFunctionAttrs(
+const CallEvent &Call, CheckerContext &C, ProgramStateRef State) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (!hasFuchsiaAttr(FuncDecl) &&
+  !hasFuchsiaUnownedAttr(FuncDecl))
+return State;
+
+  SValBuilder &SVB = C.getSValBuilder();
+  const LocationContext *LC = C.getLocationContext();
+  auto RetVal = SVB.conjureSymbolVal(Call.getOriginExpr(), LC,
+ FuncDecl->getReturnType(), 
C.blockCount());
+  State = State->BindExpr(Call.getOriginExpr(), LC, RetVal);
+
+  SymbolRef RetSym = RetVal.getAsSymbol();
+
+  assert(RetSym && "Symbol should be there");
+
+  if (hasFuchsiaAttr(FuncDecl)) {
+State = State->set(RetSym,
+  HandleState::getMaybeAllocated(nullptr, 0));
+  } else {
+State = State->set(RetSym, HandleState::getUnowned(0));
+  }
+
+  return State;
+}
+
+bool FuchsiaHandleChecker::evalCall(const CallEvent &Call,
+CheckerContext &C) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+  // Checker depends on attributes attached to function definition, so there is
+  // no way to proccess futher w/o declaration.
+  if (!FuncDecl)
+return false;
+
+  ProgramStateRef State = C.getState();
+  ProgramStateRef OldState = State;
+
+  if (needsInvalidate(Call)) {
+// If checker models a call, then body won't be inlined, so
+// all pointers (including annotated ones) should be invalidated.
+//
+// This call will also create a conjured symbol for each reference,
+// which then will be obtained by getFuchsiaHandleSymbols().
+State = Call.invalidateRegions(C.blockCount(), State);
+  }
+
+  State = evalFunctionAttrs(Call, C, State);
+  State = evalArgsAttrs(Call, C, State);
+
+  C.addTransition(State);
+
+  return State != OldState;

NagyDonat wrote:

If `evalCall()` returns false, it shouldn't do anything, in particular it 
shouldn't add a transition. I didn't analyze all the possible effects, but this 
mostly harmless transition might produce nasty bugs in some corner case.

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -99,6 +99,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"

NagyDonat wrote:

This seems to be redundant below `ProgramState.h`. Did some "helpful" editor 
add it automatically?

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -314,6 +449,127 @@ getFuchsiaHandleSymbols(QualType QT, SVal Arg, 
ProgramStateRef State) {
   return {};
 }
 
+bool FuchsiaHandleChecker::needsInvalidate(const CallEvent &Call) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+
+if (PVD->getType()->isAnyPointerType() &&
+(hasFuchsiaAttr(PVD) ||
+ hasFuchsiaAttr(PVD) ||
+ hasFuchsiaUnownedAttr(PVD))) {
+  return true;
+}
+  }
+
+  return false;
+}
+
+ProgramStateRef
+FuchsiaHandleChecker::evalArgsAttrs(const CallEvent &Call, CheckerContext &C,
+ProgramStateRef State) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+  SymbolRef ResultSymbol = nullptr;
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (const auto *TypeDefTy = FuncDecl->getReturnType()->getAs())
+if (TypeDefTy->getDecl()->getName() == ErrorTypeName) {
+  SValBuilder &SVB = C.getSValBuilder();
+  const LocationContext *LC = C.getLocationContext();
+  auto RetVal = SVB.conjureSymbolVal(
+  Call.getOriginExpr(), LC, FuncDecl->getReturnType(), C.blockCount());
+
+  State = State->BindExpr(Call.getOriginExpr(), LC, RetVal);
+  ResultSymbol = RetVal.getAsSymbol();
+}
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+SmallVector Handles =
+getFuchsiaHandleSymbols(PVD->getType(), Call.getArgSVal(Arg), State);
+
+for (SymbolRef Handle : Handles) {
+  if (hasFuchsiaAttr(PVD)) {
+State =
+State->set(Handle, HandleState::getReleased(Arg + 1));
+  } else if (hasFuchsiaAttr(PVD)) {
+State = State->set(
+Handle, HandleState::getMaybeAllocated(ResultSymbol, Arg + 1));
+  } else if (hasFuchsiaUnownedAttr(PVD)) {
+State = State->set(Handle, HandleState::getUnowned(Arg + 
1));
+  }
+}
+  }
+
+  return State;
+}
+
+ProgramStateRef FuchsiaHandleChecker::evalFunctionAttrs(
+const CallEvent &Call, CheckerContext &C, ProgramStateRef State) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (!hasFuchsiaAttr(FuncDecl) &&
+  !hasFuchsiaUnownedAttr(FuncDecl))
+return State;
+
+  SValBuilder &SVB = C.getSValBuilder();
+  const LocationContext *LC = C.getLocationContext();
+  auto RetVal = SVB.conjureSymbolVal(Call.getOriginExpr(), LC,
+ FuncDecl->getReturnType(), 
C.blockCount());
+  State = State->BindExpr(Call.getOriginExpr(), LC, RetVal);

NagyDonat wrote:

This logic doesn't bind a return value to the call expression in the case when 
the function declaration doesn't have a Fuchsia attribute.

If you have a function where some arguments have Fuchsia-specific attributes 
(so `evalArgsAttrs` and the invalidation activate), but the function 
declaration itself doesn't have an attribute, then you end up with a function 
call to which no return value was bound.

I vaguely recall that this would be equivalent to binding `UnknownVal` to the 
call, which is not a fatal issue, but it'd be better to ensure that we always 
bind a conjured value when the evaluation does something.

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -267,12 +286,128 @@ class FuchsiaHandleSymbolVisitor final : public 
SymbolVisitor {
 private:
   SmallVector Symbols;
 };
+
+class FuchsiaBugVisitor final : public BugReporterVisitor {
+  // Handle that caused a problem.
+  SymbolRef Sym;
+
+  bool IsLeak;
+
+public:
+  FuchsiaBugVisitor(SymbolRef S, bool Leak = false) : Sym(S), IsLeak(Leak) {}
+
+  void Profile(llvm::FoldingSetNodeID &ID) const override {
+ID.AddPointer(Sym);
+  }
+
+  static inline bool isAllocated(const HandleState *RSCurr,
+ const HandleState *RSPrev, const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && (RSCurr->isAllocated() || RSCurr->maybeAllocated()) &&
+ (!RSPrev ||
+  !(RSPrev->isAllocated() || RSPrev->maybeAllocated();
+  }
+
+  static inline bool isAllocatedUnowned(const HandleState *RSCurr,
+const HandleState *RSPrev,
+const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && RSCurr->isUnowned()) &&
+(!RSPrev || !RSPrev->isUnowned()));
+  }
+
+  static inline bool isReleased(const HandleState *RSCurr,
+const HandleState *RSPrev, const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && RSCurr->isReleased()) &&
+(!RSPrev || !RSPrev->isReleased()));
+  }
+
+  PathDiagnosticPieceRef VisitNode(const ExplodedNode *N,
+   BugReporterContext &BRC,
+   PathSensitiveBugReport &BR) override;
+
+  PathDiagnosticPieceRef getEndPath(BugReporterContext &BRC,
+const ExplodedNode *EndPathNode,
+PathSensitiveBugReport &BR) override {
+if (!IsLeak)
+  return nullptr;
+
+PathDiagnosticLocation L = BR.getLocation();
+// Do not add the statement itself as a range in case of leak.
+return std::make_shared(L, BR.getDescription(),
+  false);
+  }
+};
 } // end anonymous namespace
 
+PathDiagnosticPieceRef
+FuchsiaBugVisitor::VisitNode(const ExplodedNode *N, BugReporterContext &BRC,
+ PathSensitiveBugReport &BR) {
+  ProgramStateRef State = N->getState();
+  ProgramStateRef StatePrev = N->getFirstPred()->getState();
+
+  const HandleState *RSCurr = State->get(Sym);
+  const HandleState *RSPrev = StatePrev->get(Sym);
+
+  const Stmt *S = N->getStmtForDiagnostics();
+
+  StringRef Msg;
+  SmallString<256> Buf;
+  llvm::raw_svector_ostream OS(Buf);
+
+  if (isAllocated(RSCurr, RSPrev, S)) {
+auto Index = RSCurr->getIndex();
+
+if (Index > 0)
+  OS << "Handle allocated through " << Index
+ << llvm::getOrdinalSuffix(Index) << " parameter";
+else
+  OS << "Function returns an open handle";
+
+Msg = OS.str();
+  } else if (isAllocatedUnowned(RSCurr, RSPrev, S)) {
+auto Index = RSCurr->getIndex();
+
+if (Index > 0)
+  OS << "Unowned handle allocated through " << Index
+ << llvm::getOrdinalSuffix(Index) << " parameter";
+else
+  OS << "Function returns an unowned handle";
+
+Msg = OS.str();
+  } else if (isReleased(RSCurr, RSPrev, S)) {
+auto Index = RSCurr->getIndex();
+
+assert(Index > 0);
+
+OS << "Handle released through " << Index << llvm::getOrdinalSuffix(Index)
+   << " parameter";
+Msg = OS.str();
+  }
+
+  if (Msg.empty())
+return nullptr;
+
+  PathDiagnosticLocation Pos;
+  if (!S) {
+auto PostImplCall = N->getLocation().getAs();
+if (!PostImplCall)
+  return nullptr;
+Pos = PathDiagnosticLocation(PostImplCall->getLocation(),
+ BRC.getSourceManager());
+  } else {
+Pos = PathDiagnosticLocation(S, BRC.getSourceManager(),
+ N->getLocationContext());
+  }
+
+  return std::make_shared(Pos, Msg, true);
+}
+
 /// Returns the symbols extracted from the argument or empty vector if it 
cannot
 /// be found. It is unlikely to have over 1024 symbols in one argument.
-static SmallVector
-getFuchsiaHandleSymbols(QualType QT, SVal Arg, ProgramStateRef State) {
+SmallVector getFuchsiaHandleSymbols(QualType QT, SVal Arg,

NagyDonat wrote:

```suggestion
SmallVector getFuchsiaHandleSymbols(QualType QT, SVal Arg,
```
This `1024` is extremely suspicious here, and although I'm completely 
unfamiliar with Fuchsia, I'd be very surprised to see an argument that contains 
more than a handful of handles (considering that loop modeling is limited and 
we won't fill a long array with handles).

The [LLVM Programmer's 
Manual](https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h)
 says that
> In the absence of a well-motivated choice for the number of inlined elements 
> `N`, it is recommended to use `SmallVector` (that is, omitting the `N`). 
> This wi

[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -267,12 +286,128 @@ class FuchsiaHandleSymbolVisitor final : public 
SymbolVisitor {
 private:
   SmallVector Symbols;
 };
+
+class FuchsiaBugVisitor final : public BugReporterVisitor {
+  // Handle that caused a problem.
+  SymbolRef Sym;
+
+  bool IsLeak;
+
+public:
+  FuchsiaBugVisitor(SymbolRef S, bool Leak = false) : Sym(S), IsLeak(Leak) {}
+
+  void Profile(llvm::FoldingSetNodeID &ID) const override {
+ID.AddPointer(Sym);
+  }
+
+  static inline bool isAllocated(const HandleState *RSCurr,
+ const HandleState *RSPrev, const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && (RSCurr->isAllocated() || RSCurr->maybeAllocated()) &&
+ (!RSPrev ||
+  !(RSPrev->isAllocated() || RSPrev->maybeAllocated();
+  }
+
+  static inline bool isAllocatedUnowned(const HandleState *RSCurr,
+const HandleState *RSPrev,
+const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && RSCurr->isUnowned()) &&
+(!RSPrev || !RSPrev->isUnowned()));
+  }
+
+  static inline bool isReleased(const HandleState *RSCurr,
+const HandleState *RSPrev, const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && RSCurr->isReleased()) &&
+(!RSPrev || !RSPrev->isReleased()));
+  }
+
+  PathDiagnosticPieceRef VisitNode(const ExplodedNode *N,
+   BugReporterContext &BRC,
+   PathSensitiveBugReport &BR) override;
+
+  PathDiagnosticPieceRef getEndPath(BugReporterContext &BRC,
+const ExplodedNode *EndPathNode,
+PathSensitiveBugReport &BR) override {
+if (!IsLeak)
+  return nullptr;
+
+PathDiagnosticLocation L = BR.getLocation();
+// Do not add the statement itself as a range in case of leak.
+return std::make_shared(L, BR.getDescription(),
+  false);
+  }
+};
 } // end anonymous namespace
 
+PathDiagnosticPieceRef
+FuchsiaBugVisitor::VisitNode(const ExplodedNode *N, BugReporterContext &BRC,
+ PathSensitiveBugReport &BR) {
+  ProgramStateRef State = N->getState();
+  ProgramStateRef StatePrev = N->getFirstPred()->getState();
+
+  const HandleState *RSCurr = State->get(Sym);
+  const HandleState *RSPrev = StatePrev->get(Sym);
+
+  const Stmt *S = N->getStmtForDiagnostics();
+
+  StringRef Msg;
+  SmallString<256> Buf;
+  llvm::raw_svector_ostream OS(Buf);
+
+  if (isAllocated(RSCurr, RSPrev, S)) {
+auto Index = RSCurr->getIndex();
+
+if (Index > 0)
+  OS << "Handle allocated through " << Index
+ << llvm::getOrdinalSuffix(Index) << " parameter";
+else
+  OS << "Function returns an open handle";
+
+Msg = OS.str();
+  } else if (isAllocatedUnowned(RSCurr, RSPrev, S)) {
+auto Index = RSCurr->getIndex();
+
+if (Index > 0)
+  OS << "Unowned handle allocated through " << Index
+ << llvm::getOrdinalSuffix(Index) << " parameter";
+else
+  OS << "Function returns an unowned handle";
+
+Msg = OS.str();
+  } else if (isReleased(RSCurr, RSPrev, S)) {
+auto Index = RSCurr->getIndex();
+
+assert(Index > 0);
+
+OS << "Handle released through " << Index << llvm::getOrdinalSuffix(Index)
+   << " parameter";
+Msg = OS.str();
+  }
+
+  if (Msg.empty())
+return nullptr;
+
+  PathDiagnosticLocation Pos;
+  if (!S) {
+auto PostImplCall = N->getLocation().getAs();
+if (!PostImplCall)
+  return nullptr;
+Pos = PathDiagnosticLocation(PostImplCall->getLocation(),
+ BRC.getSourceManager());

NagyDonat wrote:

What is the goal of this logic? Perhaps add a brief comment that explains this.

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -267,12 +286,128 @@ class FuchsiaHandleSymbolVisitor final : public 
SymbolVisitor {
 private:
   SmallVector Symbols;
 };
+
+class FuchsiaBugVisitor final : public BugReporterVisitor {
+  // Handle that caused a problem.
+  SymbolRef Sym;
+
+  bool IsLeak;
+
+public:
+  FuchsiaBugVisitor(SymbolRef S, bool Leak = false) : Sym(S), IsLeak(Leak) {}
+
+  void Profile(llvm::FoldingSetNodeID &ID) const override {
+ID.AddPointer(Sym);
+  }
+
+  static inline bool isAllocated(const HandleState *RSCurr,
+ const HandleState *RSPrev, const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && (RSCurr->isAllocated() || RSCurr->maybeAllocated()) &&
+ (!RSPrev ||
+  !(RSPrev->isAllocated() || RSPrev->maybeAllocated();
+  }
+
+  static inline bool isAllocatedUnowned(const HandleState *RSCurr,
+const HandleState *RSPrev,
+const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && RSCurr->isUnowned()) &&
+(!RSPrev || !RSPrev->isUnowned()));
+  }
+
+  static inline bool isReleased(const HandleState *RSCurr,
+const HandleState *RSPrev, const Stmt *Stmt) {
+return isa_and_nonnull(Stmt) &&
+   ((RSCurr && RSCurr->isReleased()) &&
+(!RSPrev || !RSPrev->isReleased()));
+  }
+
+  PathDiagnosticPieceRef VisitNode(const ExplodedNode *N,
+   BugReporterContext &BRC,
+   PathSensitiveBugReport &BR) override;
+
+  PathDiagnosticPieceRef getEndPath(BugReporterContext &BRC,
+const ExplodedNode *EndPathNode,
+PathSensitiveBugReport &BR) override {
+if (!IsLeak)
+  return nullptr;
+
+PathDiagnosticLocation L = BR.getLocation();
+// Do not add the statement itself as a range in case of leak.
+return std::make_shared(L, BR.getDescription(),
+  false);
+  }
+};
 } // end anonymous namespace
 
+PathDiagnosticPieceRef
+FuchsiaBugVisitor::VisitNode(const ExplodedNode *N, BugReporterContext &BRC,
+ PathSensitiveBugReport &BR) {
+  ProgramStateRef State = N->getState();
+  ProgramStateRef StatePrev = N->getFirstPred()->getState();
+
+  const HandleState *RSCurr = State->get(Sym);
+  const HandleState *RSPrev = StatePrev->get(Sym);
+
+  const Stmt *S = N->getStmtForDiagnostics();
+
+  StringRef Msg;
+  SmallString<256> Buf;
+  llvm::raw_svector_ostream OS(Buf);
+
+  if (isAllocated(RSCurr, RSPrev, S)) {
+auto Index = RSCurr->getIndex();
+
+if (Index > 0)
+  OS << "Handle allocated through " << Index
+ << llvm::getOrdinalSuffix(Index) << " parameter";
+else
+  OS << "Function returns an open handle";
+
+Msg = OS.str();

NagyDonat wrote:

Instead of repeating this, call this once after the `if` - `else if` blocks. 

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -336,141 +592,55 @@ void FuchsiaHandleChecker::checkPreCall(const CallEvent 
&Call,
 SmallVector Handles =
 getFuchsiaHandleSymbols(PVD->getType(), Call.getArgSVal(Arg), State);
 
-// Handled in checkPostCall.
-if (hasFuchsiaAttr(PVD) ||
-hasFuchsiaAttr(PVD))
-  continue;
-
 for (SymbolRef Handle : Handles) {
   const HandleState *HState = State->get(Handle);
-  if (!HState || HState->isEscaped())

NagyDonat wrote:

Why did you change this? In the case of `!HState` you will still skip all 
"action" in the loop because you added `HState &&` to each inner `if` -- so it 
would be clearer to keep the "`!HState` $\Rightarrow$ immediately `continue`" 
logic. (Obviously if you didn't intend to add `HState &&` everywhere in the 
loop, then fix that instead.)

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -127,26 +135,30 @@ class HandleState {
   bool isEscaped() const { return K == Kind::Escaped; }
   bool isUnowned() const { return K == Kind::Unowned; }
 
-  static HandleState getMaybeAllocated(SymbolRef ErrorSym) {
-return HandleState(Kind::MaybeAllocated, ErrorSym);
+  static HandleState getMaybeAllocated(SymbolRef ErrorSym, std::uint8_t Idx) {

NagyDonat wrote:

Why do you use `std::uint8_t Idx` here while `Idx` is `unsigned` elsewhere?

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits


@@ -314,6 +449,127 @@ getFuchsiaHandleSymbols(QualType QT, SVal Arg, 
ProgramStateRef State) {
   return {};
 }
 
+bool FuchsiaHandleChecker::needsInvalidate(const CallEvent &Call) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+
+if (PVD->getType()->isAnyPointerType() &&
+(hasFuchsiaAttr(PVD) ||
+ hasFuchsiaAttr(PVD) ||
+ hasFuchsiaUnownedAttr(PVD))) {
+  return true;
+}
+  }
+
+  return false;
+}
+
+ProgramStateRef
+FuchsiaHandleChecker::evalArgsAttrs(const CallEvent &Call, CheckerContext &C,
+ProgramStateRef State) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+  SymbolRef ResultSymbol = nullptr;
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (const auto *TypeDefTy = FuncDecl->getReturnType()->getAs())
+if (TypeDefTy->getDecl()->getName() == ErrorTypeName) {
+  SValBuilder &SVB = C.getSValBuilder();
+  const LocationContext *LC = C.getLocationContext();
+  auto RetVal = SVB.conjureSymbolVal(
+  Call.getOriginExpr(), LC, FuncDecl->getReturnType(), C.blockCount());
+
+  State = State->BindExpr(Call.getOriginExpr(), LC, RetVal);
+  ResultSymbol = RetVal.getAsSymbol();
+}
+
+  for (unsigned Arg = 0; Arg < Call.getNumArgs(); ++Arg) {
+if (Arg >= FuncDecl->getNumParams())
+  break;
+const ParmVarDecl *PVD = FuncDecl->getParamDecl(Arg);
+SmallVector Handles =
+getFuchsiaHandleSymbols(PVD->getType(), Call.getArgSVal(Arg), State);
+
+for (SymbolRef Handle : Handles) {
+  if (hasFuchsiaAttr(PVD)) {
+State =
+State->set(Handle, HandleState::getReleased(Arg + 1));
+  } else if (hasFuchsiaAttr(PVD)) {
+State = State->set(
+Handle, HandleState::getMaybeAllocated(ResultSymbol, Arg + 1));
+  } else if (hasFuchsiaUnownedAttr(PVD)) {
+State = State->set(Handle, HandleState::getUnowned(Arg + 
1));
+  }
+}
+  }
+
+  return State;
+}
+
+ProgramStateRef FuchsiaHandleChecker::evalFunctionAttrs(
+const CallEvent &Call, CheckerContext &C, ProgramStateRef State) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+
+  assert(FuncDecl && "Should have FuncDecl at this point");
+
+  if (!hasFuchsiaAttr(FuncDecl) &&
+  !hasFuchsiaUnownedAttr(FuncDecl))
+return State;
+
+  SValBuilder &SVB = C.getSValBuilder();
+  const LocationContext *LC = C.getLocationContext();
+  auto RetVal = SVB.conjureSymbolVal(Call.getOriginExpr(), LC,
+ FuncDecl->getReturnType(), 
C.blockCount());
+  State = State->BindExpr(Call.getOriginExpr(), LC, RetVal);
+
+  SymbolRef RetSym = RetVal.getAsSymbol();
+
+  assert(RetSym && "Symbol should be there");
+
+  if (hasFuchsiaAttr(FuncDecl)) {
+State = State->set(RetSym,
+  HandleState::getMaybeAllocated(nullptr, 0));
+  } else {
+State = State->set(RetSym, HandleState::getUnowned(0));
+  }
+
+  return State;
+}
+
+bool FuchsiaHandleChecker::evalCall(const CallEvent &Call,
+CheckerContext &C) const {
+  const FunctionDecl *FuncDecl = 
dyn_cast_or_null(Call.getDecl());
+  // Checker depends on attributes attached to function definition, so there is

NagyDonat wrote:

```suggestion
  // Checker depends on attributes attached to function declaration, so there is
```
Did you mean function "declaration" here (instead of definition)? (I'm just 
guessing because I don't see anythong that gets the actual definition.)

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat commented:

I read the patch and added my suggestions in inline comments, but I don't 
promise that I found every little corner case.

Moreover, I'm strongly opposed to introducing a `BugReporterVisitor` instead of 
directly creating the notes, because in this case you *already know* the 
locations where you want to add the notes (because that's where you update the 
`HandleState`), so it's completely pointless to delay the note creation to an 
awkward indirect post-procession step performed by a visitor. This way you 
wouldn't need to complicate the handle state by adding the (otherwise 
completely irrelevant) argument index in it.

In general, bug reporter visitors are complex tools that are good for complex 
situations where you want to find arbitrary complicated patterns in the bug 
report path, but should be avoided in simpler cases.

Admittedly, the old note creation code looks like an Obfuscated C Code Contest 
entry, but if you clear it up by introducing a few "natural" helper functions, 
you'll get an implementation that's significantly easier to understand than the 
"hide these numbers in the state, then later come back and reconstruct the 
situation" visitor.

https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Modernize FuchsiaHandleChecker (PR #111588)

2024-10-09 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/111588
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix crash when casting the result of a malformed fptr call (PR #111390)

2024-10-09 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat approved this pull request.

LGTM. Let's not crash.

https://github.com/llvm/llvm-project/pull/111390
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,773 @@
+//===--- MutexModeling.cpp - Modeling of mutexes 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines modeling checker for tracking mutex states.
+//
+//===--===//
+
+#include "MutexModeling/MutexModelingAPI.h"
+#include "MutexModeling/MutexModelingDomain.h"
+#include "MutexModeling/MutexRegionExtractor.h"
+
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include 
+
+using namespace clang;
+using namespace ento;
+using namespace mutex_modeling;
+
+namespace {
+
+// When a lock is destroyed, in some semantics(like PthreadSemantics) we are 
not

NagyDonat wrote:

```suggestion
// When a lock is destroyed, in some semantics (like PthreadSemantics) we are 
not
```
A space is missing; and then you also need to re-wrap the text because this 
overflows 80 characters.

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,773 @@
+//===--- MutexModeling.cpp - Modeling of mutexes 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines modeling checker for tracking mutex states.
+//
+//===--===//
+
+#include "MutexModeling/MutexModelingAPI.h"
+#include "MutexModeling/MutexModelingDomain.h"
+#include "MutexModeling/MutexRegionExtractor.h"
+
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include 
+
+using namespace clang;
+using namespace ento;
+using namespace mutex_modeling;
+
+namespace {
+
+// When a lock is destroyed, in some semantics(like PthreadSemantics) we are 
not
+// sure if the destroy call has succeeded or failed, and the lock enters one of
+// the 'possibly destroyed' state. There is a short time frame for the
+// programmer to check the return value to see if the lock was successfully
+// destroyed. Before we model the next operation over that lock, we call this
+// function to see if the return value was checked by now and set the lock 
state
+// - either to destroyed state or back to its previous state.
+
+// In PthreadSemantics, pthread_mutex_destroy() returns zero if the lock is
+// successfully destroyed and it returns a non-zero value otherwise.
+ProgramStateRef resolvePossiblyDestroyedMutex(ProgramStateRef State,
+  const MemRegion *LockReg,
+  const SymbolRef *LockReturnSym) {
+  const LockStateKind *LockState = State->get(LockReg);
+  // Existence in DestroyRetVal ensures existence in LockMap.
+  // Existence in Destroyed also ensures that the lock state for lockR is 
either
+  // UntouchedAndPossiblyDestroyed or UnlockedAndPossiblyDestroyed.
+  assert(LockState);
+  assert(*LockState == LockStateKind::UntouchedAndPossiblyDestroyed ||
+ *LockState == LockStateKind::UnlockedAndPossiblyDestroyed);
+
+  ConstraintManager &CMgr = State->getConstraintManager();
+  ConditionTruthVal RetZero = CMgr.isNull(State, *LockReturnSym);
+  if (RetZero.isConstrainedFalse()) {
+switch (*LockState) {
+case LockStateKind::UntouchedAndPossiblyDestroyed: {
+  State = State->remove(LockReg);
+  break;
+}
+case LockStateKind::UnlockedAndPossiblyDestroyed: {
+  State = State->set(LockReg, LockStateKind::Unlocked);
+  break;
+}
+default:
+  llvm_unreachable("Unknown lock state for a lock inside DestroyRetVal");
+}
+  } else {
+State = State->set(LockReg, LockStateKind::Destroyed);
+  }
+
+  // Removing the map entry (LockReg, sym) from DestroyRetVal as the lock
+  // state is now resolved.
+  return State->remove(LockReg);
+}
+
+ProgramStateRef doResolvePossiblyDestroyedMutex(ProgramStateRef State,
+const MemRegion *MTX) {
+  assert(MTX && "should only be called with a mutex region");
+
+  if (const SymbolRef *Sym = State->get(MTX))
+return resolvePossiblyDestroyedMutex(State, MTX, Sym);
+  return State;
+}
+
+class MutexModeling : public Checker {
+public:
+  void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
+
+  void checkDeadSymbols(SymbolReaper &SymReaper, CheckerContext &C) const;
+
+  ProgramStateRef
+  checkRegionChanges(ProgramStateRef State, const InvalidatedSymbols *Symbols,
+ ArrayRef ExplicitRegions,
+ ArrayRef Regions,
+ const LocationContext *LCtx, const CallEvent *Call) const;
+
+private:
+  mutable std::unique_ptr BT_initlock;
+
+  // When handling events, NoteTags can be placed on ProgramPoints. This struct
+  // supports returning both the resulting ProgramState and a NoteTag.
+  struct ModelingResult {
+ProgramStateRef State;
+const NoteTag *Note = nullptr;
+  };
+
+  ModelingResult handleInit(const EventDescriptor &Event, const MemRegion *MTX,
+const CallEvent &Call, ProgramStateRef State,
+CheckerContext &C) const;
+  ModelingResult onSuccessfulAcquire(const MemRegion *MTX,
+ const CallEvent &Call,
+ ProgramStateRef State,
+ CheckerContext &C) const;
+  ModelingResult markCritSection(ModelingResult InputState,
+ const MemRegion *MTX, const CallEvent &Call,
+ CheckerContext &C) const;
+  ModelingResult handleAcquire(const EventDes

[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,169 @@
+//===--- MutexModelingGDM.h - Modeling of mutexes in GDM 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines the GDM definitions for tracking mutex states.
+//
+//===--===//
+
+#ifndef LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_MUTEXMODELINGGDM_H
+#define LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_MUTEXMODELINGGDM_H
+
+#include "MutexModelingDomain.h"
+
+#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
+#include "llvm/ADT/FoldingSet.h"
+
+// GDM-related handle-types for tracking mutex states.
+namespace clang {
+namespace ento {
+namespace mutex_modeling {
+
+// Raw data of the mutex events.
+class MutexEvents {};
+using MutexEventsTy = llvm::ImmutableList;
+
+// Aggregated data structures for tracking critical sections.
+class CritSections {};
+using CritSectionsTy = llvm::ImmutableList;
+
+// Aggregated data structures for tracking mutex states.
+class LockStates {};
+using LockStatesTy = llvm::ImmutableMap;
+
+// Tracks return values of destroy operations for potentially destroyed mutexes
+class DestroyedRetVals {};
+using DestroyedRetValsTy = llvm::ImmutableMap;
+
+} // namespace mutex_modeling
+} // namespace ento
+} // namespace clang
+
+// Enable usage of mutex modeling data structures in llvm::FoldingSet.
+namespace llvm {
+// Specialization for EventMarker to allow its use in FoldingSet
+template <> struct FoldingSetTrait {
+  static void Profile(const clang::ento::mutex_modeling::EventMarker &EM,
+  llvm::FoldingSetNodeID &ID) {
+ID.Add(EM.Kind);
+ID.Add(EM.Library);
+ID.Add(EM.Semantics);
+ID.Add(EM.EventII);
+ID.Add(EM.EventExpr);
+ID.Add(EM.MutexRegion);
+  }
+};
+
+// Specialization for CritSectionMarker to allow its use in FoldingSet
+template <>
+struct FoldingSetTrait {
+  static void Profile(const clang::ento::mutex_modeling::CritSectionMarker 
&CSM,
+  llvm::FoldingSetNodeID &ID) {
+ID.Add(CSM.BeginExpr);
+ID.Add(CSM.MutexRegion);
+  }
+};
+} // namespace llvm
+
+// Iterator traits for ImmutableList and ImmutableMap data structures
+// that enable the use of STL algorithms.
+namespace std {
+// These specializations allow the use of STL algorithms with our custom data
+// structures
+// TODO: Move these to llvm::ImmutableList when overhauling immutable data
+// structures for proper iterator concept support.
+template <>
+struct iterator_traits::iterator> {
+  using iterator_category = std::forward_iterator_tag;
+  using value_type = clang::ento::mutex_modeling::EventMarker;
+  using difference_type = std::ptrdiff_t;
+  using reference = clang::ento::mutex_modeling::EventMarker &;
+  using pointer = clang::ento::mutex_modeling::EventMarker *;
+};
+template <>
+struct iterator_traits::iterator> {
+  using iterator_category = std::forward_iterator_tag;
+  using value_type = clang::ento::mutex_modeling::CritSectionMarker;
+  using difference_type = std::ptrdiff_t;
+  using reference = clang::ento::mutex_modeling::CritSectionMarker &;
+  using pointer = clang::ento::mutex_modeling::CritSectionMarker *;
+};
+template <>
+struct iterator_traits::iterator> {
+  using iterator_category = std::forward_iterator_tag;
+  using value_type = std::pair;
+  using difference_type = std::ptrdiff_t;
+  using reference = std::pair &;
+  using pointer = std::pair *;
+};
+template <>
+struct iterator_traits::iterator> {
+  using iterator_category = std::forward_iterator_tag;
+  using value_type =
+  std::pair;
+  using difference_type = std::ptrdiff_t;
+  using reference =
+  std::pair &;
+  using pointer =
+  std::pair *;
+};
+} // namespace std
+
+// NOTE: ProgramState macros are not used here, because the visibility of these
+// GDM entries must span multiple translation units (multiple checkers).
+// TODO: check if this is still true after finishing the implementation.

NagyDonat wrote:

Another alternative would be placing these in a single translation unit 
(`MutexModeling.cpp`) and introducing some externally visible boilerplate 
functions that wrap their operations that need to be seen from outside. I don't 
know if this would be better or worse than your current solution where you 
expose and duplicate the internals of the state trait macros.

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,126 @@
+//===--- MutexModelingDomain.h - Common vocabulary for modeling mutexes 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines common types and related functions used in the mutex modeling 
domain.
+//
+//===--===//
+
+#ifndef LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_MUTEXMODELINGDOMAIN_H
+#define LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_MUTEXMODELINGDOMAIN_H
+
+#include "MutexRegionExtractor.h"
+
+// Forward declarations.
+namespace clang {
+class Expr;
+class IdentifierInfo;
+namespace ento {
+class MemRegion;
+} // namespace ento
+} // namespace clang
+
+namespace clang::ento::mutex_modeling {
+
+// Represents different kinds of mutex-related events
+enum class EventKind { Init, Acquire, TryAcquire, Release, Destroy };
+
+// TODO: Ideally the modeling should not know about which checkers consume the
+// modeling information. This enum is here to make a correspondence between the
+// checked mutex event the library that event came from. In order to keep the
+// external API of multiple distinct checkers (PthreadLockChecker,
+// FuchsiaLockChecker and C11LockChecker), this mapping is done here, but if
+// more consumers of this modeling arise, adding all of them here may not be
+// feasible and we may need to make this modeling more flexible.
+enum class LibraryKind { NotApplicable = 0, Pthread, Fuchsia, C11 };
+
+// Represents different mutex operation semantics
+enum class SemanticsKind { NotApplicable = 0, PthreadSemantics, XNUSemantics };
+
+// Represents different states a mutex can be in, including error states
+enum class LockStateKind {
+  Unlocked,
+  Locked,
+  Destroyed,
+  UntouchedAndPossiblyDestroyed,
+  UnlockedAndPossiblyDestroyed,
+  Error_DoubleInit,// Mutex initialized twice
+  Error_DoubleInitWhileLocked, // Mutex initialized while already locked
+  Error_DoubleLock,// Mutex locked twice without unlocking
+  Error_LockDestroyed, // Attempt to lock a destroyed mutex
+  Error_DoubleUnlock,  // Mutex unlocked twice without locking
+  Error_UnlockDestroyed,   // Attempt to unlock a destroyed mutex
+  Error_LockReversal,  // Locks acquired in incorrect order
+  Error_DestroyLocked, // Attempt to destroy a locked mutex
+  Error_DoubleDestroy  // Mutex destroyed twice
+};
+
+/// This class is intended for describing the list of events to detect.
+/// This list of events is the configuration of the MutexModeling checker.
+struct EventDescriptor {
+  MutexRegionExtractor Trigger;
+  EventKind Kind{};
+  LibraryKind Library{};
+  SemanticsKind Semantics{};
+
+  // TODO: Modernize to spaceship when C++20 is available.
+  [[nodiscard]] bool operator!=(const EventDescriptor &Other) const noexcept {
+return !(Trigger == Other.Trigger) || Library != Other.Library ||

NagyDonat wrote:

I think you can simply use `Trigger != Other.Trigger` because `std::variant` 
has a suitable `operator!=` in C++17 or later.

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,773 @@
+//===--- MutexModeling.cpp - Modeling of mutexes 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines modeling checker for tracking mutex states.
+//
+//===--===//
+
+#include "MutexModeling/MutexModelingAPI.h"
+#include "MutexModeling/MutexModelingDomain.h"
+#include "MutexModeling/MutexRegionExtractor.h"
+
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include 
+
+using namespace clang;
+using namespace ento;
+using namespace mutex_modeling;
+
+namespace {
+
+// When a lock is destroyed, in some semantics(like PthreadSemantics) we are 
not
+// sure if the destroy call has succeeded or failed, and the lock enters one of
+// the 'possibly destroyed' state. There is a short time frame for the
+// programmer to check the return value to see if the lock was successfully
+// destroyed. Before we model the next operation over that lock, we call this
+// function to see if the return value was checked by now and set the lock 
state
+// - either to destroyed state or back to its previous state.
+
+// In PthreadSemantics, pthread_mutex_destroy() returns zero if the lock is
+// successfully destroyed and it returns a non-zero value otherwise.
+ProgramStateRef resolvePossiblyDestroyedMutex(ProgramStateRef State,
+  const MemRegion *LockReg,
+  const SymbolRef *LockReturnSym) {
+  const LockStateKind *LockState = State->get(LockReg);
+  // Existence in DestroyRetVal ensures existence in LockMap.
+  // Existence in Destroyed also ensures that the lock state for lockR is 
either
+  // UntouchedAndPossiblyDestroyed or UnlockedAndPossiblyDestroyed.
+  assert(LockState);
+  assert(*LockState == LockStateKind::UntouchedAndPossiblyDestroyed ||
+ *LockState == LockStateKind::UnlockedAndPossiblyDestroyed);
+
+  ConstraintManager &CMgr = State->getConstraintManager();
+  ConditionTruthVal RetZero = CMgr.isNull(State, *LockReturnSym);
+  if (RetZero.isConstrainedFalse()) {
+switch (*LockState) {
+case LockStateKind::UntouchedAndPossiblyDestroyed: {
+  State = State->remove(LockReg);
+  break;
+}
+case LockStateKind::UnlockedAndPossiblyDestroyed: {
+  State = State->set(LockReg, LockStateKind::Unlocked);
+  break;
+}
+default:
+  llvm_unreachable("Unknown lock state for a lock inside DestroyRetVal");
+}
+  } else {
+State = State->set(LockReg, LockStateKind::Destroyed);
+  }
+
+  // Removing the map entry (LockReg, sym) from DestroyRetVal as the lock
+  // state is now resolved.
+  return State->remove(LockReg);
+}
+
+ProgramStateRef doResolvePossiblyDestroyedMutex(ProgramStateRef State,
+const MemRegion *MTX) {
+  assert(MTX && "should only be called with a mutex region");
+
+  if (const SymbolRef *Sym = State->get(MTX))
+return resolvePossiblyDestroyedMutex(State, MTX, Sym);
+  return State;
+}
+
+class MutexModeling : public Checker {
+public:
+  void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
+
+  void checkDeadSymbols(SymbolReaper &SymReaper, CheckerContext &C) const;
+
+  ProgramStateRef
+  checkRegionChanges(ProgramStateRef State, const InvalidatedSymbols *Symbols,
+ ArrayRef ExplicitRegions,
+ ArrayRef Regions,
+ const LocationContext *LCtx, const CallEvent *Call) const;
+
+private:
+  mutable std::unique_ptr BT_initlock;
+
+  // When handling events, NoteTags can be placed on ProgramPoints. This struct
+  // supports returning both the resulting ProgramState and a NoteTag.
+  struct ModelingResult {
+ProgramStateRef State;
+const NoteTag *Note = nullptr;
+  };

NagyDonat wrote:

This is a nice little struct, I like it :)

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,773 @@
+//===--- MutexModeling.cpp - Modeling of mutexes 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines modeling checker for tracking mutex states.
+//
+//===--===//
+
+#include "MutexModeling/MutexModelingAPI.h"
+#include "MutexModeling/MutexModelingDomain.h"
+#include "MutexModeling/MutexRegionExtractor.h"
+
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include 
+
+using namespace clang;
+using namespace ento;
+using namespace mutex_modeling;
+
+namespace {
+
+// When a lock is destroyed, in some semantics(like PthreadSemantics) we are 
not
+// sure if the destroy call has succeeded or failed, and the lock enters one of
+// the 'possibly destroyed' state. There is a short time frame for the
+// programmer to check the return value to see if the lock was successfully
+// destroyed. Before we model the next operation over that lock, we call this
+// function to see if the return value was checked by now and set the lock 
state
+// - either to destroyed state or back to its previous state.
+
+// In PthreadSemantics, pthread_mutex_destroy() returns zero if the lock is
+// successfully destroyed and it returns a non-zero value otherwise.
+ProgramStateRef resolvePossiblyDestroyedMutex(ProgramStateRef State,
+  const MemRegion *LockReg,
+  const SymbolRef *LockReturnSym) {
+  const LockStateKind *LockState = State->get(LockReg);
+  // Existence in DestroyRetVal ensures existence in LockMap.
+  // Existence in Destroyed also ensures that the lock state for lockR is 
either
+  // UntouchedAndPossiblyDestroyed or UnlockedAndPossiblyDestroyed.
+  assert(LockState);
+  assert(*LockState == LockStateKind::UntouchedAndPossiblyDestroyed ||
+ *LockState == LockStateKind::UnlockedAndPossiblyDestroyed);
+
+  ConstraintManager &CMgr = State->getConstraintManager();
+  ConditionTruthVal RetZero = CMgr.isNull(State, *LockReturnSym);
+  if (RetZero.isConstrainedFalse()) {
+switch (*LockState) {
+case LockStateKind::UntouchedAndPossiblyDestroyed: {
+  State = State->remove(LockReg);
+  break;
+}
+case LockStateKind::UnlockedAndPossiblyDestroyed: {
+  State = State->set(LockReg, LockStateKind::Unlocked);
+  break;
+}
+default:
+  llvm_unreachable("Unknown lock state for a lock inside DestroyRetVal");
+}
+  } else {
+State = State->set(LockReg, LockStateKind::Destroyed);
+  }
+
+  // Removing the map entry (LockReg, sym) from DestroyRetVal as the lock
+  // state is now resolved.
+  return State->remove(LockReg);
+}
+
+ProgramStateRef doResolvePossiblyDestroyedMutex(ProgramStateRef State,

NagyDonat wrote:

I think a name like `resolveMutexIfPossiblyDestroyed` would be more 
descriptive. Also, in this patch this is called from only one location; if you 
don't expect further use, then I think you can simply inline this at that point.

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,139 @@
+//===--- MutexRegionExtractor.h - Modeling of mutexes 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines modeling checker for tracking mutex states.
+//
+//===--===//
+
+#ifndef 
LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_MUTEXMODELING_MUTEXREGIONEXTRACTOR_H
+#define 
LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_MUTEXMODELING_MUTEXREGIONEXTRACTOR_H
+
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
+#include 
+
+namespace clang::ento::mutex_modeling {
+
+// Extracts the mutex region from the first argument of a function call
+class FirstArgMutexExtractor {
+  CallDescription CD;
+
+public:
+  template 
+  FirstArgMutexExtractor(T &&CD) : CD(std::forward(CD)) {}
+
+  [[nodiscard]] bool matches(const CallEvent &Call) const {
+return CD.matches(Call);
+  }
+
+  [[nodiscard]] const MemRegion *getRegion(const CallEvent &Call) const {
+return Call.getArgSVal(0).getAsRegion();
+  }
+};
+
+// Extracts the mutex region from the 'this' pointer of a member function call
+class MemberMutexExtractor {
+  CallDescription CD;
+
+public:
+  template 
+  MemberMutexExtractor(T &&CD) : CD(std::forward(CD)) {}
+
+  [[nodiscard]] bool matches(const CallEvent &Call) const {
+return CD.matches(Call);
+  }
+
+  [[nodiscard]] const MemRegion *getRegion(const CallEvent &Call) const {
+return llvm::cast(Call).getCXXThisVal().getAsRegion();
+  }
+};
+
+// Template class for extracting mutex regions from RAII-style lock/unlock
+// operations
+template  class RAIIMutexExtractor {
+  mutable const clang::IdentifierInfo *Guard{};
+  mutable bool IdentifierInfoInitialized{};
+  mutable llvm::SmallString<32> GuardName{};
+
+  void initIdentifierInfo(const CallEvent &Call) const {
+if (!IdentifierInfoInitialized) {
+  // In case of checking C code, or when the corresponding headers are not
+  // included, we might end up query the identifier table every time when
+  // this function is called instead of early returning it. To avoid this,
+  // a bool variable (IdentifierInfoInitialized) is used and the function
+  // will be run only once.
+  const auto &ASTCtx = Call.getState()->getStateManager().getContext();
+  Guard = &ASTCtx.Idents.get(GuardName);
+}
+  }
+
+  template  bool matchesImpl(const CallEvent &Call) const {
+const T *C = llvm::dyn_cast(&Call);
+if (!C)
+  return false;
+const clang::IdentifierInfo *II =
+llvm::cast(C->getDecl()->getParent())
+->getIdentifier();
+return II == Guard;
+  }
+
+public:
+  RAIIMutexExtractor(llvm::StringRef GuardName) : GuardName(GuardName) {}
+  [[nodiscard]] bool matches(const CallEvent &Call) const {
+initIdentifierInfo(Call);
+if constexpr (IsLock) {
+  return matchesImpl(Call);
+} else {
+  return matchesImpl(Call);
+}
+  }
+  [[nodiscard]] const MemRegion *getRegion(const CallEvent &Call) const {
+const MemRegion *MutexRegion = nullptr;
+if constexpr (IsLock) {
+  if (std::optional Object = Call.getReturnValueUnderConstruction()) 
{
+MutexRegion = Object->getAsRegion();
+  }
+} else {
+  MutexRegion =
+  llvm::cast(Call).getCXXThisVal().getAsRegion();
+}
+return MutexRegion;
+  }
+};
+
+// Specializations for RAII-style lock and release operations
+using RAIILockExtractor = RAIIMutexExtractor;
+using RAIIReleaseExtractor = RAIIMutexExtractor;
+
+// Variant type that can hold any of the mutex region extractor types
+using MutexRegionExtractor =
+std::variant;
+
+// Helper functions for working with MutexRegionExtractor variant
+inline const MemRegion *getRegion(const MutexRegionExtractor &Extractor,
+  const CallEvent &Call) {
+  return std::visit(
+  [&Call](auto &&Descriptor) { return Descriptor.getRegion(Call); },
+  Extractor);
+}
+
+inline bool operator==(const MutexRegionExtractor &LHS,
+   const MutexRegionExtractor &RHS) {
+  return std::visit([](auto &&LHS, auto &&RHS) { return LHS == RHS; }, LHS,
+RHS);
+}

NagyDonat wrote:

Do you need to define this explicitly? It seems that in C++17 or later, 
`std::variant` has `operator==` and `operator!=` with the natural semantics 
that we want here: 
https://en.cppreference.com/w/cpp/utility/variant/operator_cmp .

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat commented:

Nice well-designed code; I added several remarks, but they are all minor.

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,139 @@
+//===--- MutexRegionExtractor.h - Modeling of mutexes 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines modeling checker for tracking mutex states.

NagyDonat wrote:

This seems to be copy-pasted from `MutexModeling.cpp`.

https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Introduce MutexModeling checker (PR #111381)

2024-10-08 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/111381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix wrong `builtin_*_overflow` return type (PR #111253)

2024-10-07 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

@pskrgag Thanks for fixing this issue quickly! :smile: 

https://github.com/llvm/llvm-project/pull/111253
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Check initialization and argument passing in FixedAddressChecker (PR #110977)

2024-10-04 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

> I was thinking about using `check::Location` in this checker. The real 
> problem is when the fixed address is used (to store or load), not if it is 
> assigned to a pointer. (Or a fixed address becomes escaped.)

I agree that a checker that activates when a fixed address is _dereferenced_ 
would be more useful overall. In fact before you started to work on this 
checker, I misread its code and thought that it already operates that way (like 
`core.NullDereference`)

By the way, be careful with `check::Location` because I vaguely recall that 
there are a few corner cases where it does not cover some sort of location 
access: 
https://discourse.llvm.org/t/checklocation-vs-checkbind-when-isload-false/72728/6
 . (Ask @steakhal if you need more details. At that point examination of this 
problem was interrupted because I converted `ArrayBoundV2` to using `PreStmt` 
callbacks (for unrelated reasons) -- but if you want to introduce 
`check::Location` here then it might be useful to revisit the issue.)

The current behavior of this checker is not entirely useless (I can imagine 
that perhaps somebody wants to enable it to enforce a design rule), but I don't 
think that it's relevant enough to preserve.

https://github.com/llvm/llvm-project/pull/110977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Check initialization and argument passing in FixedAddressChecker (PR #110977)

2024-10-03 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat commented:

Nice improvement, thanks for the patch! I read the code but I don't have 
anything to mention which isn't covered by the review of @steakhal .

https://github.com/llvm/llvm-project/pull/110977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Check initialization and argument passing in FixedAddressChecker (PR #110977)

2024-10-03 Thread Donát Nagy via cfe-commits


@@ -63,11 +62,39 @@ void FixedAddressChecker::checkPreStmt(const BinaryOperator 
*B,
 "Using a fixed address is not portable because that address will "
 "probably not be valid in all environments or platforms.";
 auto R = std::make_unique(BT, Msg, N);
-R->addRange(B->getRHS()->getSourceRange());
+R->addRange(SrcExpr->getSourceRange());
 C.emitReport(std::move(R));
   }
 }
 
+void FixedAddressChecker::checkPreStmt(const BinaryOperator *B,
+   CheckerContext &C) const {
+  if (B->getOpcode() != BO_Assign)
+return;
+
+  checkUseOfFixedAddress(B->getType(), B->getRHS(), C);
+}
+
+void FixedAddressChecker::checkPreStmt(const DeclStmt *D,
+   CheckerContext &C) const {
+  for (const auto *D1 : D->decls()) {

NagyDonat wrote:

I agree that "`DS`" / "`D`" would be a more traditional naming choice, but 
actually there's something poetic in abbreviating  "one declaration from `D`" 
as  "`D1`".

https://github.com/llvm/llvm-project/pull/110977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-02 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

@isuckatcs If I am not mistaken, I reacted to every comment in your first 
review.

I'm sorry for the salty initial reactions -- after thinking about this topic 
for months, I ended up feeling that everything is _so obvious_, while in fact 
it's a really complicated topic and perhaps my original explanations weren't 
clear enough.

I understand your point of view that it would be _nice_ if the programmers 
marked their knowledge with assertions (and we could reason based on the lack 
of assertions), and e.g. half a year ago I might've agreed with you, but I 
don't think that it's the way forward. This is an application where the user is 
always right and we must avoid producing annoying/unwanted reports even if they 
are technically correct or they would become helpful after some additional work 
performed by the user.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-02 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-02 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-02 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/109804

From 23b27377e556085054621f27d97059618b416695 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 23 Sep 2024 15:42:20 +0200
Subject: [PATCH 1/7] [analyzer] Suppress out of bounds reports after weak loop
 assumptions

The checker alpha.security.ArrayBoundV2 produced lots of false positives
in situations where loop modeling of the engine fed it with unfounded
assumptions.

This commit introduces a heuristic that discards ArrayBoundV2 reports
when the execution path introduces an assumption that is questionable.
More precisely, two kinds of assumptions are categorized as "weak":
(1) When the analyzer assumes that the first evaluation of the loop
condition returns false and the loop body is completely skipped.
(2) When the analyzer assumes that the loop condition is true in a
situation where it already executed (at least) two iterations.
For examples and more explanation, see the new tests.

The actual implementation uses some approximations (it uses the
BlockCount instead of the iteration count) because that seems to be
"good enough" for this heuristical suppression.

Note that I used minor state updates instead of bug reporter visitors
because the number of finished iterations is not visible in the visitor
which "walks backwards in time".

As a very minor unrelated change, this commit removes the "Bin" part
from the method name "evalEagerlyAssumeBinOpBifurcation" because this
method is also used for the unary logical not operator.
---
 .../Core/PathSensitive/ExprEngine.h   |  43 ++--
 .../Checkers/ArrayBoundCheckerV2.cpp  |   5 +
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  |  25 -
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  |  83 +++---
 clang/test/Analysis/loop-unrolling.cpp|   6 +-
 clang/test/Analysis/out-of-bounds.c   | 101 ++
 6 files changed, 233 insertions(+), 30 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 04eacd1df7ffe2..03e4fec2b49015 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can
+/// freely assume either of them. (This is not entirely true, because there may
+/// be unmarked logical correlations between `if` statements, but is a good
+/// enough heuristic and the analyzer strongly relies on it.)
+/// On the other hand, in a loop the state may be split repeatedly at each
+/// evaluation of the loop condition, and this can lead to following "weak"
+/// assumptions even though the code does not imply that they're valid and the
+/// programmer intended to cover them.
+/// This function is called to mark the `State` when the engine makes an
+/// assumption which is weak. Checkers may use this heuristical mark to discard
+/// result and reduce the amount of false positives.
+ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
+
+/// Returns true if `recordWeakLoopAssumption()` was called on the execution
+/// path which produced `State`.
+bool seenWeakLoopAssumption(ProgramStateRef State);
+
 class ExprEngine {
   void anchor();
 
@@ -323,12 +342,13 @@ class ExprEngine {
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
   ///  nodes by processing the 'effects' of a branch condition.
-  void processBranch(const Stmt *Condition,
- NodeBuilderContext& BuilderCtx,
- ExplodedNode *Pred,
- ExplodedNodeSet &Dst,
- const CFGBlock *DstT,
- const CFGBlock *DstF);
+  /// If the branch condition is a loop condition, IterationsFinishedInLoop is
+  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// std::nullopt.
+  void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
+ ExplodedNode *Pred, ExplodedNodeSet &Dst,
+ const CFGBlock *DstT, const CFGBlock *DstF,
+ std::optional IterationsFinishedInLoop);
 
   /// Called by CoreEngine.
   /// Used to generate successor nodes for temporary destructors depending
@@ -583,11 +603,12 @@ class ExprEngine {
 ExplodedNode *Pred,
 ExplodedNodeSet &Dst);
 
-  /// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly 
assume symbolic
-  ///  expressions of the form 'x != 0' and generate new nodes (stored in Dst)
-  ///  with those assumpti

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-02 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;
 }
 if (StTrue && StFalse)
   assert(!isa(Condition));
 
+const Expr *EagerlyAssumeExpr =
+PrevState->get();
+const Expr *ConditionExpr = dyn_cast(Condition);
+if (ConditionExpr)
+  ConditionExpr = ConditionExpr->IgnoreParenCasts();

NagyDonat wrote:

I summarized my thoughts in the inline comment 
https://github.com/llvm/llvm-project/pull/109804#discussion_r1784801535 (which 
is attached to the test which corresponds to this part of the code).

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-02 Thread Donát Nagy via cfe-commits


@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete 
*p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+if (GlobalArray[i] > 0)
+  return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}
+
+void loop_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  for (int i = 0; i < len; i++) {
+// When a programmer writes a loop, we may assume that they intended at
+// least two iterations.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+// We should suppress array bounds errors on the third and later iterations
+// of loops, because sometimes programmers write a loop in sitiuations
+// where they know that there will be at most two iterations.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_cast(int len) {
+  int buf[2] = {0};
+  for (int i = 0; (unsigned)(i < len); i++) {

NagyDonat wrote:

I thought about this question, and I realized that in fact my 
`IgnoreParenCasts()` call cannot cause trouble, because its result is only used 
in a very limited way: the only reference to it is that it's compared to 
`EagerlyAssumeExpr`. (The `assumeCondition` call happens above this.)

The only effect of ignoring the cast is that this way I can recognize that the 
loop condition expression `(unsigned)(i < len)` is _essentially_ the same as `i 
< len` i.e. the conditional expression where the eager assumption happens.

Now that I think about it, I feel that highlighting this "cast around the 
conditional expression" case is a bit arbitrary because obviously there _will_ 
be some differences between the `eagerly-assume=true` and 
`eagerly-assume=false` analysis modes. For example, IIUC in the similarly 
awkward `for (int i=0; !(i >= len); i++)` the suppression would activate under 
`eagerly-assume=false` (because then the loop condition is ambiguous) but it 
wouldn't activate under `eagerly-assume=true` (because the eager assumption is 
tied to `i >= len` and not the full loop condition which is `!(i >= len)`).

Based on this I think the best solution would be just deleting both this 
`IgnoreParenCasts()` call and the testcase with `(unsigned)(i >= len)`.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Remove dangling method declaration from ErrnoChecker (PR #110820)

2024-10-02 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat closed 
https://github.com/llvm/llvm-project/pull/110820
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Remove dangling method declaration from ErrnoChecker (PR #110820)

2024-10-02 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat created 
https://github.com/llvm/llvm-project/pull/110820

Remove the declaration of `ErrnoChecker::checkBranchCondition()` because this 
method is not defined or used anywhere. (It's probably a leftover from some old 
refactoring.)

From 72bc17b87a7955a58210dfe42d7d73b6c0c6803c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 2 Oct 2024 12:24:44 +0200
Subject: [PATCH] [analyzer][NFC] Remove dangling method declaration from
 ErrnoChecker

Remove the declaration of `ErrnoChecker::checkBranchCondition()` because
this method is not defined or used anywhere. (It's probably a leftover
from some old refactoring.)
---
 clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp | 1 -
 1 file changed, 1 deletion(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
index 72fd6781a75615..beb3c8574b5fd4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
@@ -42,7 +42,6 @@ class ErrnoChecker
  ArrayRef ExplicitRegions,
  ArrayRef Regions,
  const LocationContext *LCtx, const CallEvent *Call) const;
-  void checkBranchCondition(const Stmt *Condition, CheckerContext &Ctx) const;
 
   /// Indicates if a read (load) of \c errno is allowed in a non-condition part
   /// of \c if, \c switch, loop and conditional statements when the errno

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Less redundant warnings from FixedAddressChecker (PR #110458)

2024-10-02 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat approved this pull request.

Thanks for the update, I think you can merge this now.

https://github.com/llvm/llvm-project/pull/110458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-01 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

> We can always have a map, mapping ForStmts to finished iterations. Basically, 
> from the ErrorNode, we could walk backwards (as usual in a visitor like 
> that), and check if the current ProgramPoint is PostStmt. We would then grab 
> the wrapped Stmt to see if it's the terminator Stmt of a For-like statement. 
> If so, them bump the counter in the map of that loop. Once we finished the 
> visitation, in the `finalizeVisitor` we could see if any of the loops had a 
> wrong counter, and mark the report invalid.
> 
> If we want to see if we did a state-split when we left the loop we would have 
> 2 options:
> 
> 1. Take the State and the condition expression and query by using 
> `assume` as usual. We should get the same answer as during the egraph 
> construction.
> 
> 2. Have a look at the **original** exploded graph and check if we have 
> only have a single successor node.

That's a clever solution, but I'm still not convinced that overall it'd be 
easier to understand than what I wrote.

Note that in addition to checking for state split when the analyzer _leaves_ 
the loop, the heuristics also need to check for state split when the analyzer 
_remains_ in the loop, so for each loop we would need to manage a list like 
"left the loop with[out] assumption, before that remained in the loop with[out] 
assumption, before that remained in the loop with[out] assumption, ..."

This already awkward code would become even more complex if we determined that 
we need accurate iteration counts in cases when a loop is reached several times 
(from different function calls or from different iterations of an outer loop). 
Implementing this would be already difficult in my eager model (see the last 
few paragraphs of this inline comment: 
https://github.com/llvm/llvm-project/pull/109804#discussion_r1782588587 ), and 
the backward-iteration logic would make this even more complicated.

> Note that the exploded graph that the VisitNode traverses is stripped, thus 
> it's a single path of nodes, without ambiguities of successors or predecessor 
> nodes.

Thanks for alerting me :sweat_smile: I didn't have the opportunity to run into 
this trap (yet), and now I'll try to remember it.

> > In general, I feel that `BugReporterVisitor` is a complicated piece of 
> > artillery [...]
> 
> Yes, but I also wish we could achieve this without making the internals more 
> complicated. Having a separated component doing this has a lot of value.
>
> > Moreover, if this suppression works well, then perhaps we could generalize 
> > it and add a config option which enables it for all checkers. If we want to 
> > do this, it will be nice to have this logic in `processBranch()` where we 
> > can immediately discard the "weak" transitions (instead of just marking 
> > them).
> 
> A BugReportVisitor is just as generic to me.

I see that there is value in keeping the internals as simple as possible, but I 
also feel that hiding pieces of the engine logic in scattered visitors also 
makes the code harder to understand. (Especially since the visitor will 
activate at a very different stage of the analysis, so its influence is 
completely invisible at the engine logic where it "belongs".)

I think this is a "simple is better than complex; complex is better than 
complicated" case (to quote the zen of python): we should try to keep the 
engine as simple as possible, but we should not hide parts of the necessary 
complexity with complicated tricks 

If we end up with generalizing this and the "weak" transitions will be 
recognized by many checkers (or we discard them altogether), then it would be 
important to have this logic in its "natural" place in the engine (instead of 
being hidden away in a visitor).

On the other hand, if this heuristic remains limited to ArrayBoundV2, then I 
agree that architecturally it would be better to place this logic in a visitor 
in ArrayBoundCheckerV2.cpp -- but even in this case I feel that this concern is 
counterbalanced by the fact that the visitor code would be significantly more 
awkward and complex than the direct eager implementation.

Moreover, if we end up with enabling this heuristic for _all_ the checkers then 
we can outright discard the weak transitions, which could provide a significant 
performance improvement compared to both the visitor-based alternative 
implementation and even the status quo.

If we eagerly prune the exploded graph by not following the weak loop 
assumptions, then we could avoid visiting lots of execution paths, while
- the visitor-based approach would just discard results from these paths after 
wasting time on analyzing them;
- the status quo displays results found on these paths, but I suspect that the 
results coming after assuming 0 or 3+ loop iterations are redundant and very 
similar to the results after 1 or 2 (with a few differences where I'd prefer to 
see the findings from 1 or 2 iterations).

 On complex-real-world code I think it's fair t

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-01 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/109804

From 23b27377e556085054621f27d97059618b416695 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 23 Sep 2024 15:42:20 +0200
Subject: [PATCH 1/6] [analyzer] Suppress out of bounds reports after weak loop
 assumptions

The checker alpha.security.ArrayBoundV2 produced lots of false positives
in situations where loop modeling of the engine fed it with unfounded
assumptions.

This commit introduces a heuristic that discards ArrayBoundV2 reports
when the execution path introduces an assumption that is questionable.
More precisely, two kinds of assumptions are categorized as "weak":
(1) When the analyzer assumes that the first evaluation of the loop
condition returns false and the loop body is completely skipped.
(2) When the analyzer assumes that the loop condition is true in a
situation where it already executed (at least) two iterations.
For examples and more explanation, see the new tests.

The actual implementation uses some approximations (it uses the
BlockCount instead of the iteration count) because that seems to be
"good enough" for this heuristical suppression.

Note that I used minor state updates instead of bug reporter visitors
because the number of finished iterations is not visible in the visitor
which "walks backwards in time".

As a very minor unrelated change, this commit removes the "Bin" part
from the method name "evalEagerlyAssumeBinOpBifurcation" because this
method is also used for the unary logical not operator.
---
 .../Core/PathSensitive/ExprEngine.h   |  43 ++--
 .../Checkers/ArrayBoundCheckerV2.cpp  |   5 +
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  |  25 -
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  |  83 +++---
 clang/test/Analysis/loop-unrolling.cpp|   6 +-
 clang/test/Analysis/out-of-bounds.c   | 101 ++
 6 files changed, 233 insertions(+), 30 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 04eacd1df7ffe2..03e4fec2b49015 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can
+/// freely assume either of them. (This is not entirely true, because there may
+/// be unmarked logical correlations between `if` statements, but is a good
+/// enough heuristic and the analyzer strongly relies on it.)
+/// On the other hand, in a loop the state may be split repeatedly at each
+/// evaluation of the loop condition, and this can lead to following "weak"
+/// assumptions even though the code does not imply that they're valid and the
+/// programmer intended to cover them.
+/// This function is called to mark the `State` when the engine makes an
+/// assumption which is weak. Checkers may use this heuristical mark to discard
+/// result and reduce the amount of false positives.
+ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
+
+/// Returns true if `recordWeakLoopAssumption()` was called on the execution
+/// path which produced `State`.
+bool seenWeakLoopAssumption(ProgramStateRef State);
+
 class ExprEngine {
   void anchor();
 
@@ -323,12 +342,13 @@ class ExprEngine {
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
   ///  nodes by processing the 'effects' of a branch condition.
-  void processBranch(const Stmt *Condition,
- NodeBuilderContext& BuilderCtx,
- ExplodedNode *Pred,
- ExplodedNodeSet &Dst,
- const CFGBlock *DstT,
- const CFGBlock *DstF);
+  /// If the branch condition is a loop condition, IterationsFinishedInLoop is
+  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// std::nullopt.
+  void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
+ ExplodedNode *Pred, ExplodedNodeSet &Dst,
+ const CFGBlock *DstT, const CFGBlock *DstF,
+ std::optional IterationsFinishedInLoop);
 
   /// Called by CoreEngine.
   /// Used to generate successor nodes for temporary destructors depending
@@ -583,11 +603,12 @@ class ExprEngine {
 ExplodedNode *Pred,
 ExplodedNodeSet &Dst);
 
-  /// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly 
assume symbolic
-  ///  expressions of the form 'x != 0' and generate new nodes (stored in Dst)
-  ///  with those assumpti

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-10-01 Thread Donát Nagy via cfe-commits


@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete 
*p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+if (GlobalArray[i] > 0)
+  return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}
+
+void loop_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  for (int i = 0; i < len; i++) {
+// When a programmer writes a loop, we may assume that they intended at
+// least two iterations.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+// We should suppress array bounds errors on the third and later iterations
+// of loops, because sometimes programmers write a loop in sitiuations
+// where they know that there will be at most two iterations.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_cast(int len) {
+  int buf[2] = {0};
+  for (int i = 0; (unsigned)(i < len); i++) {
+// Check that a (somewhat arbitrary) cast does not hinder the recognition
+// of the condition expression.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; i < len && flag; i++) {
+// FIXME: In this case the checker should suppress the warning the same way
+// as it's suppressed in loop_suppress_in_third_iteration, but the
+// suppression is not activated because the terminator statement associated
+// with the loop is just the expression 'flag', while 'i < len' is a
+// separate terminator statement that's associated with the
+// short-circuiting operator '&&'.
+// I have seen a real-world FP that looks like this, but it is much rarer
+// than the basic setup.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; flag && i < len; i++) {
+// If the two operands of '&&' are flipped, the suppression works.
+buf[i] = 1; // no-warning
+  }
+}
+
+int coinflip(void);
+int do_while_report_after_one_iteration(void) {
+  int i = 0;
+  do {
+i++;
+  } while (coinflip());
+  // Unlike `loop_suppress_after_zero_iterations`, running just one iteration
+  // in a do-while is not a corner case that would produce too many false
+  // positives, so don't suppress bounds errors in these situations.
+  return GlobalArray[i-2]; // expected-warning{{Out of bound access to memory}}
+}
+
+void do_while_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  int i = 0;
+  do {
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  } while (i++ < len);
+}
+
+void do_while_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  int i = 0;
+  do {
+buf[i] = 1; // no-warning
+  } while (i++ < len);
+}

NagyDonat wrote:

I added these testcases in 
https://github.com/llvm/llvm-project/pull/109804/commits/cbb46e58e819c864e4e24f5764d8464528d0b806

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;

NagyDonat wrote:

I pushed a slightly tweaked variant of this and it does not break anything in 
the basic test suite (check-clang-analysis). Thanks for the suggestion!

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/109804

From 23b27377e556085054621f27d97059618b416695 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 23 Sep 2024 15:42:20 +0200
Subject: [PATCH 1/4] [analyzer] Suppress out of bounds reports after weak loop
 assumptions

The checker alpha.security.ArrayBoundV2 produced lots of false positives
in situations where loop modeling of the engine fed it with unfounded
assumptions.

This commit introduces a heuristic that discards ArrayBoundV2 reports
when the execution path introduces an assumption that is questionable.
More precisely, two kinds of assumptions are categorized as "weak":
(1) When the analyzer assumes that the first evaluation of the loop
condition returns false and the loop body is completely skipped.
(2) When the analyzer assumes that the loop condition is true in a
situation where it already executed (at least) two iterations.
For examples and more explanation, see the new tests.

The actual implementation uses some approximations (it uses the
BlockCount instead of the iteration count) because that seems to be
"good enough" for this heuristical suppression.

Note that I used minor state updates instead of bug reporter visitors
because the number of finished iterations is not visible in the visitor
which "walks backwards in time".

As a very minor unrelated change, this commit removes the "Bin" part
from the method name "evalEagerlyAssumeBinOpBifurcation" because this
method is also used for the unary logical not operator.
---
 .../Core/PathSensitive/ExprEngine.h   |  43 ++--
 .../Checkers/ArrayBoundCheckerV2.cpp  |   5 +
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  |  25 -
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  |  83 +++---
 clang/test/Analysis/loop-unrolling.cpp|   6 +-
 clang/test/Analysis/out-of-bounds.c   | 101 ++
 6 files changed, 233 insertions(+), 30 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 04eacd1df7ffe2..03e4fec2b49015 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can
+/// freely assume either of them. (This is not entirely true, because there may
+/// be unmarked logical correlations between `if` statements, but is a good
+/// enough heuristic and the analyzer strongly relies on it.)
+/// On the other hand, in a loop the state may be split repeatedly at each
+/// evaluation of the loop condition, and this can lead to following "weak"
+/// assumptions even though the code does not imply that they're valid and the
+/// programmer intended to cover them.
+/// This function is called to mark the `State` when the engine makes an
+/// assumption which is weak. Checkers may use this heuristical mark to discard
+/// result and reduce the amount of false positives.
+ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
+
+/// Returns true if `recordWeakLoopAssumption()` was called on the execution
+/// path which produced `State`.
+bool seenWeakLoopAssumption(ProgramStateRef State);
+
 class ExprEngine {
   void anchor();
 
@@ -323,12 +342,13 @@ class ExprEngine {
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
   ///  nodes by processing the 'effects' of a branch condition.
-  void processBranch(const Stmt *Condition,
- NodeBuilderContext& BuilderCtx,
- ExplodedNode *Pred,
- ExplodedNodeSet &Dst,
- const CFGBlock *DstT,
- const CFGBlock *DstF);
+  /// If the branch condition is a loop condition, IterationsFinishedInLoop is
+  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// std::nullopt.
+  void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
+ ExplodedNode *Pred, ExplodedNodeSet &Dst,
+ const CFGBlock *DstT, const CFGBlock *DstF,
+ std::optional IterationsFinishedInLoop);
 
   /// Called by CoreEngine.
   /// Used to generate successor nodes for temporary destructors depending
@@ -583,11 +603,12 @@ class ExprEngine {
 ExplodedNode *Pred,
 ExplodedNodeSet &Dst);
 
-  /// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly 
assume symbolic
-  ///  expressions of the form 'x != 0' and generate new nodes (stored in Dst)
-  ///  with those assumpti

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/109804

From 23b27377e556085054621f27d97059618b416695 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 23 Sep 2024 15:42:20 +0200
Subject: [PATCH 1/3] [analyzer] Suppress out of bounds reports after weak loop
 assumptions

The checker alpha.security.ArrayBoundV2 produced lots of false positives
in situations where loop modeling of the engine fed it with unfounded
assumptions.

This commit introduces a heuristic that discards ArrayBoundV2 reports
when the execution path introduces an assumption that is questionable.
More precisely, two kinds of assumptions are categorized as "weak":
(1) When the analyzer assumes that the first evaluation of the loop
condition returns false and the loop body is completely skipped.
(2) When the analyzer assumes that the loop condition is true in a
situation where it already executed (at least) two iterations.
For examples and more explanation, see the new tests.

The actual implementation uses some approximations (it uses the
BlockCount instead of the iteration count) because that seems to be
"good enough" for this heuristical suppression.

Note that I used minor state updates instead of bug reporter visitors
because the number of finished iterations is not visible in the visitor
which "walks backwards in time".

As a very minor unrelated change, this commit removes the "Bin" part
from the method name "evalEagerlyAssumeBinOpBifurcation" because this
method is also used for the unary logical not operator.
---
 .../Core/PathSensitive/ExprEngine.h   |  43 ++--
 .../Checkers/ArrayBoundCheckerV2.cpp  |   5 +
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  |  25 -
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  |  83 +++---
 clang/test/Analysis/loop-unrolling.cpp|   6 +-
 clang/test/Analysis/out-of-bounds.c   | 101 ++
 6 files changed, 233 insertions(+), 30 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 04eacd1df7ffe2..03e4fec2b49015 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can
+/// freely assume either of them. (This is not entirely true, because there may
+/// be unmarked logical correlations between `if` statements, but is a good
+/// enough heuristic and the analyzer strongly relies on it.)
+/// On the other hand, in a loop the state may be split repeatedly at each
+/// evaluation of the loop condition, and this can lead to following "weak"
+/// assumptions even though the code does not imply that they're valid and the
+/// programmer intended to cover them.
+/// This function is called to mark the `State` when the engine makes an
+/// assumption which is weak. Checkers may use this heuristical mark to discard
+/// result and reduce the amount of false positives.
+ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
+
+/// Returns true if `recordWeakLoopAssumption()` was called on the execution
+/// path which produced `State`.
+bool seenWeakLoopAssumption(ProgramStateRef State);
+
 class ExprEngine {
   void anchor();
 
@@ -323,12 +342,13 @@ class ExprEngine {
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
   ///  nodes by processing the 'effects' of a branch condition.
-  void processBranch(const Stmt *Condition,
- NodeBuilderContext& BuilderCtx,
- ExplodedNode *Pred,
- ExplodedNodeSet &Dst,
- const CFGBlock *DstT,
- const CFGBlock *DstF);
+  /// If the branch condition is a loop condition, IterationsFinishedInLoop is
+  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// std::nullopt.
+  void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
+ ExplodedNode *Pred, ExplodedNodeSet &Dst,
+ const CFGBlock *DstT, const CFGBlock *DstF,
+ std::optional IterationsFinishedInLoop);
 
   /// Called by CoreEngine.
   /// Used to generate successor nodes for temporary destructors depending
@@ -583,11 +603,12 @@ class ExprEngine {
 ExplodedNode *Pred,
 ExplodedNodeSet &Dst);
 
-  /// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly 
assume symbolic
-  ///  expressions of the form 'x != 0' and generate new nodes (stored in Dst)
-  ///  with those assumpti

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -441,10 +441,33 @@ void CoreEngine::HandleCallEnter(const CallEnter &CE, 
ExplodedNode *Pred) {
 void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
 const CFGBlock * B, ExplodedNode *Pred) {
   assert(B->succ_size() == 2);
+
+  const LocationContext *LC = Pred->getLocationContext();
+  BlockCounter Counter = WList->getBlockCounter();
+  unsigned BlockCount =
+  Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
+  std::optional IterationsFinishedInLoop = std::nullopt;
+  if (isa(Term)) {
+// FIXME: This code approximates the number of finished iteration based on
+// the block count, i.e. the number of evaluations of the terminator block
+// on the current execution path (which includes the current evaluation, so
+// is always at least 1). This is probably acceptable for the
+// checker-specific false positive suppression that currently uses this
+// value, but it would be better to calcuate an accurate count of
+// iterations.

NagyDonat wrote:

It *was* wrapped to 79 columns (the default of the vim 'gw' command), but 
"checker-specific" and "iterations" are both very long words and they wouldn't 
fit even within 80 columns.

However I abbreviated "at least" to ">=" and rewrapped the text to make it 
visually nicer.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] RFC: [clang-tidy] [analyzer] Nondeterministic pointer usage improvements (PR #110471)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,1450 @@
+// Like the compiler, the static analyzer treats some functions differently if

NagyDonat wrote:

Quick nit: this comment refers to the "static analyzer", you should probably 
change it to a reference to "Clang Tidy".

https://github.com/llvm/llvm-project/pull/110471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Model overflow builtins (PR #102602)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat approved this pull request.

Also LGTM, sorry for not responding earlier. I agree with the minor suggestions 
of @steakhal .

https://github.com/llvm/llvm-project/pull/102602
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Model overflow builtins (PR #102602)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/102602
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Model overflow builtins (PR #102602)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -16,21 +16,93 @@
 
 #include "clang/Basic/Builtins.h"
 #include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
+#include "clang/StaticAnalyzer/Checkers/Taint.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
 
 using namespace clang;
 using namespace ento;
+using namespace taint;
 
 namespace {
 
+QualType getSufficientTypeForOverflowOp(CheckerContext &C, const QualType &T) {
+  // Calling a builtin with a non-integer type result produces compiler error.
+  assert(T->isIntegerType());
+
+  ASTContext &ACtx = C.getASTContext();
+
+  unsigned BitWidth = ACtx.getIntWidth(T);
+  return ACtx.getIntTypeForBitwidth(BitWidth * 2, T->isSignedIntegerType());

NagyDonat wrote:

That's good enough for practical purposes. I guess that perhaps we could get an 
incorrect result in some weird corner case, but I don't think that it'll appear 
in real-world code.

https://github.com/llvm/llvm-project/pull/102602
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Model overflow builtins (PR #102602)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -0,0 +1,30 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core -analyzer-output text \
+// RUN:   -verify %s
+
+void test_no_overflow_note(int a, int b)
+{
+   int res;
+
+   if (__builtin_add_overflow(a, b, &res)) // expected-note {{Assuming 
overflow does not happen}}

NagyDonat wrote:

I agree that the phrasing suggested by @steakhal would be better, but I think 
that the original is also acceptable.

https://github.com/llvm/llvm-project/pull/102602
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -25,16 +25,20 @@ using namespace ento;
 using namespace taint;
 
 namespace {
-class DivZeroChecker : public Checker< check::PreStmt > {
-  const BugType BT{this, "Division by zero"};
-  const BugType TaintBT{this, "Division by zero", categories::TaintedData};
+class DivZeroChecker : public Checker> {
   void reportBug(StringRef Msg, ProgramStateRef StateZero,
  CheckerContext &C) const;
   void reportTaintBug(StringRef Msg, ProgramStateRef StateZero,
   CheckerContext &C,
   llvm::ArrayRef TaintedSyms) const;
 
 public:
+  /// This checker class implements multiple user facing checker

NagyDonat wrote:

```suggestion
  /// This checker class implements several user facing checkers
```

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -analyze 
-analyzer-checker=optin.taint,core,alpha.security.ArrayBoundV2,optin.taint.TaintedAlloc
 -analyzer-output=text -verify %s
+// RUN: %clang_cc1 -analyze 
-analyzer-checker=optin.taint,core,alpha.security.ArrayBoundV2,optin.taint.TaintedAlloc,optin.taint.TaintedDiv
 -analyzer-output=text -verify %s

NagyDonat wrote:

This run line is heavily redundant: you first enable the group `optin.taint`, 
then you also enable two checkers from it individually.

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -20,6 +22,7 @@
 // RUN: not %clang_analyze_cc1 -Wno-pointer-to-int-cast \
 // RUN:   -Wno-incompatible-library-redeclaration -verify %s \
 // RUN:   -analyzer-checker=optin.taint.GenericTaint  \
+// RUN:   -analyzer-checker=optin.taint.TaintedDiv \

NagyDonat wrote:

```suggestion
```
I strongly suspect that there is no reason to add TaintedDiv here -- this not a 
normal test run, but an intentionally wrong invocation that passes an invalid 
config file name and verifies the error output with FileCheck.

In fact, I also suspect that you could remove the ExprInspection line below 
this without changing the meaning of this test. (You should perhaps do it for 
the sake of clarity.)

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -1703,6 +1703,12 @@ def TaintedAllocChecker: Checker<"TaintedAlloc">,
   Dependencies<[DynamicMemoryModeling, TaintPropagationChecker]>,
   Documentation;
 
+def TaintedDivChecker: Checker<"TaintedDiv">,
+  HelpText<"Check for divisions, where the denominator "
+   "might be 0 as it is a tainted (attacker controlled) value.">,

NagyDonat wrote:

```suggestion
  HelpText<"Check for divisions where the denominator is tainted "
   "(attacker controlled) and might be 0.">,
```
Again, swapping the order clearly clarifies that taintedness is not covered by 
the "might". (Your wording was also understandable, but more complicated.)

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -1288,6 +1288,34 @@ by explicitly marking the ``size`` parameter as 
sanitized. See the
 delete[] ptr;
   }
 
+.. _optin-taint-TaintedDiv:
+
+optin.taint.TaintedDiv (C, C++, ObjC)
+"
+This checker warns when the denominator in a division
+operation is a tainted (potentially attacker controlled) value.
+If the attacker can set the denominator to 0, a runtime error can
+be triggered. The checker warns if the analyzer cannot prove
+that the denominator is not 0 and it is a tainted value.
+This warning is more pessimistic than the :ref:`core-DivideZero` checker
+which warns only when it can prove that the denominator is 0.
+
+.. code-block:: c
+
+  int vulnerable(int n) {
+size_t size = 0;
+scanf("%zu", &size);
+return n/size; // warn: Division by a tainted value, possibly zero

NagyDonat wrote:

```suggestion
return n / size; // warn: Division by a tainted value, possibly zero
```
AFAIK coding style guides almost always suggest spaces around binary operators.

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -1288,6 +1288,34 @@ by explicitly marking the ``size`` parameter as 
sanitized. See the
 delete[] ptr;
   }
 
+.. _optin-taint-TaintedDiv:
+
+optin.taint.TaintedDiv (C, C++, ObjC)
+"
+This checker warns when the denominator in a division
+operation is a tainted (potentially attacker controlled) value.
+If the attacker can set the denominator to 0, a runtime error can
+be triggered. The checker warns if the analyzer cannot prove
+that the denominator is not 0 and it is a tainted value.
+This warning is more pessimistic than the :ref:`core-DivideZero` checker
+which warns only when it can prove that the denominator is 0.

NagyDonat wrote:

Re-wrap this text to a consistent line length of 80 characters.

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -1288,6 +1288,34 @@ by explicitly marking the ``size`` parameter as 
sanitized. See the
 delete[] ptr;
   }
 
+.. _optin-taint-TaintedDiv:
+
+optin.taint.TaintedDiv (C, C++, ObjC)
+"
+This checker warns when the denominator in a division
+operation is a tainted (potentially attacker controlled) value.
+If the attacker can set the denominator to 0, a runtime error can
+be triggered. The checker warns if the analyzer cannot prove
+that the denominator is not 0 and it is a tainted value.

NagyDonat wrote:

```suggestion
be triggered. The checker warns when the denominator is a tainted value and the
analyzer cannot prove that it is not 0.
```
Swap these two conditions to clarify that  "is a tainted value" is not within 
the "the analyzer cannot prove that" block.

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -1288,6 +1288,34 @@ by explicitly marking the ``size`` parameter as 
sanitized. See the
 delete[] ptr;
   }
 
+.. _optin-taint-TaintedDiv:
+
+optin.taint.TaintedDiv (C, C++, ObjC)
+"
+This checker warns when the denominator in a division
+operation is a tainted (potentially attacker controlled) value.
+If the attacker can set the denominator to 0, a runtime error can
+be triggered. The checker warns if the analyzer cannot prove
+that the denominator is not 0 and it is a tainted value.
+This warning is more pessimistic than the :ref:`core-DivideZero` checker
+which warns only when it can prove that the denominator is 0.
+
+.. code-block:: c
+
+  int vulnerable(int n) {
+size_t size = 0;
+scanf("%zu", &size);
+return n/size; // warn: Division by a tainted value, possibly zero
+  }
+
+  int not_vulnerable(int n) {
+size_t size = 0;
+scanf("%zu", &size);
+if (!size)
+  return 0;
+return n/size; // no warning

NagyDonat wrote:

```suggestion
return n / size; // no warning
```
As above.

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat commented:

Looks good overall, I added several suggestions but they mostly tweak the 
documentation and the comments.

Moreover, I don't see any tests where the separation between  
`optin.taint.TaintedDiv` and `core.DivideZero` is tested by enabling one of 
them and disabling the other. You should probably add a new short test file for 
this where:
 - taint modeling is active;
 - there are two RUN lines (with different `-verify=...` tags): one that 
enables `optin.taint.TaintedDiv` and one that enables `core.DivideZero`;
 - there are testcases that perform division by:
   - a value that comes from a tainted source and is known to be zero at the 
point of division (I know that e.g. in an `if (x == 0)` block `getSVal()` will 
return a `ConcreteInt` as the value of `x` instead of the (potentially tainted) 
symbol that was used to represent it earlier -- but it would be still nice to 
see this situation.)
   - a value that comes from a tainted source and may be either zero or nonzero;
   - a value that comes from a tainted source and is known to be nonzero.

https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add optin.taint.TaintedDiv checker (PR #106389)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/106389
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;

NagyDonat wrote:

You're right that this comment is confusing, I'll probably eliminate it by 
implementing the (relatively modest) change that I described -- or at least 
tried to describe ;) -- in it.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Less redundant warnings from FixedAddressChecker (PR #110458)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat approved this pull request.

Looks good to me, nice little improvement :)

I reworded a comment in an inline suggestion because I felt that it was a bit 
difficult to understand; but feel free to change it further if you'd prefer.

https://github.com/llvm/llvm-project/pull/110458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Less redundant warnings from FixedAddressChecker (PR #110458)

2024-09-30 Thread Donát Nagy via cfe-commits


@@ -43,6 +43,12 @@ void FixedAddressChecker::checkPreStmt(const BinaryOperator 
*B,
   if (!T->isPointerType())
 return;
 
+  // Omit warning if the RHS has already pointer type.
+  // The value may come from a variable and is candidate for a previous warning
+  // from the checker.

NagyDonat wrote:

```suggestion
  // Omit warning if the RHS has already pointer type. Without this passing
  // around one fixed value in several pointer variables would produce several
  // redundant warnings.
```

https://github.com/llvm/llvm-project/pull/110458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Less redundant warnings from FixedAddressChecker (PR #110458)

2024-09-30 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/110458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] [MallocChecker] Assume functions with `ownership_returns` return unknown memory (PR #110115)

2024-09-26 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat approved this pull request.

Good catch, thanks for the fix.

https://github.com/llvm/llvm-project/pull/110115
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete 
*p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+if (GlobalArray[i] > 0)
+  return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}
+
+void loop_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  for (int i = 0; i < len; i++) {
+// When a programmer writes a loop, we may assume that they intended at
+// least two iterations.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+// We should suppress array bounds errors on the third and later iterations
+// of loops, because sometimes programmers write a loop in sitiuations
+// where they know that there will be at most two iterations.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_cast(int len) {
+  int buf[2] = {0};
+  for (int i = 0; (unsigned)(i < len); i++) {
+// Check that a (somewhat arbitrary) cast does not hinder the recognition
+// of the condition expression.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; i < len && flag; i++) {
+// FIXME: In this case the checker should suppress the warning the same way
+// as it's suppressed in loop_suppress_in_third_iteration, but the
+// suppression is not activated because the terminator statement associated
+// with the loop is just the expression 'flag', while 'i < len' is a
+// separate terminator statement that's associated with the
+// short-circuiting operator '&&'.
+// I have seen a real-world FP that looks like this, but it is much rarer
+// than the basic setup.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; flag && i < len; i++) {
+// If the two operands of '&&' are flipped, the suppression works.
+buf[i] = 1; // no-warning
+  }
+}
+
+int coinflip(void);
+int do_while_report_after_one_iteration(void) {
+  int i = 0;
+  do {
+i++;
+  } while (coinflip());
+  // Unlike `loop_suppress_after_zero_iterations`, running just one iteration
+  // in a do-while is not a corner case that would produce too many false
+  // positives, so don't suppress bounds errors in these situations.
+  return GlobalArray[i-2]; // expected-warning{{Out of bound access to memory}}
+}
+
+void do_while_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  int i = 0;
+  do {
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  } while (i++ < len);
+}
+
+void do_while_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  int i = 0;
+  do {
+buf[i] = 1; // no-warning

NagyDonat wrote:

No, this should not be a warning, see comments on other similar cases.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete 
*p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+if (GlobalArray[i] > 0)
+  return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}
+
+void loop_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  for (int i = 0; i < len; i++) {
+// When a programmer writes a loop, we may assume that they intended at
+// least two iterations.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+// We should suppress array bounds errors on the third and later iterations
+// of loops, because sometimes programmers write a loop in sitiuations
+// where they know that there will be at most two iterations.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_cast(int len) {
+  int buf[2] = {0};
+  for (int i = 0; (unsigned)(i < len); i++) {

NagyDonat wrote:

I'll check it, but I'm fairly sure that `(unsigned)0` wouldn't cause any 
problems -- after all in C we don't even have a `bool` type and `0` is used as 
the logical false value everywhere.

However, there are other mostly theoretical cases like `(char)256` where 
stripping the cast might indeed change the value -- I'll perhaps investigate 
them, but I'm not sure that it's worth to bother with writing code that covers 
rare corner cases like this.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete 
*p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+if (GlobalArray[i] > 0)
+  return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}
+
+void loop_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  for (int i = 0; i < len; i++) {
+// When a programmer writes a loop, we may assume that they intended at
+// least two iterations.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+// We should suppress array bounds errors on the third and later iterations
+// of loops, because sometimes programmers write a loop in sitiuations
+// where they know that there will be at most two iterations.
+buf[i] = 1; // no-warning

NagyDonat wrote:

In this particular example, they could write an assert, but there are also 
loops where the condition looks like `opaque_function(i)` and even if the 
programmer _can_ write an assert, you are not in a position to say that they 
_must_ write one -- the code is completely valid and normal without it.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete 
*p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+if (GlobalArray[i] > 0)
+  return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}

NagyDonat wrote:

> If someone DID forget that an overflow can happen here and the analyzer 
> doesn't report an error, we will have a false negative, right?

Yes, in that case we'd have a false negative.

However, false negatives are acceptable, while too many false positives make 
the checker practically useless, so it's better to err on the side of not 
reporting in an ambiguous situation like this.

> If the developer knows that `len` is never 0, the proper way to communicate 
> it is `assert(len != 0)`.

This is just your preference, which may be shared by many programmers, but is 
still very far from being accepted universally. (I'd say that it's often a good 
idea to add an assertion like this, but I wouldn't want to use a tool that 
enforces its dogmatic use.)

See also my related remarks in the top-level post.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -3776,6 +3829,11 @@ void 
ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
   ProgramStateRef StateTrue, StateFalse;
   std::tie(StateTrue, StateFalse) = state->assume(*SEV);
 
+  if (StateTrue && StateFalse) {
+StateTrue = StateTrue->set(Ex);
+StateFalse = StateFalse->set(Ex);
+  }

NagyDonat wrote:

This is recorded here because this is where I clearly see that `EagerlyAssume` 
splits the state.

Later, where `processBranch` calls `assumeCondition`, we are already on one of 
these two separate branches, and we'd need to search within the earlier 
`ExplodedNode`s to find the state split caused by `EagerlyAssume`. I strongly 
suspect that this might involve the traversal of several nodes (e.g. if some 
checker callback or rare C++ lifetime magic activates) and I think that a 
solution like this would be too slow and fragile.

Also I don't really believe that "polluting states" is a serious issue, because 
I strongly suspect that merging states and joining execution paths only happens 
in very specific trivial circumstances. (Also note that when we search 
backwards on the execution path, we usually follow the first predecessor, 
discounting the possibility of having more than one predecessor.)

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;
 }
 if (StTrue && StFalse)
   assert(!isa(Condition));
 
+const Expr *EagerlyAssumeExpr =
+PrevState->get();
+const Expr *ConditionExpr = dyn_cast(Condition);
+if (ConditionExpr)
+  ConditionExpr = ConditionExpr->IgnoreParenCasts();
+bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
+bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
+builder.isFeasible(true) && builder.isFeasible(false);
+
 // Process the true branch.
 if (builder.isFeasible(true)) {
-  if (StTrue)
+  if (StTrue) {
+if (BothFeasible && IterationsFinishedInLoop &&
+*IterationsFinishedInLoop >= 2) {
+  // When programmers write a loop, they imply that at least two
+  // iterations are possible (otherwise they would just write an `if`),
+  // but the third iteration is not implied: there are situations where
+  // the programmer knows that there won't be a third iteration (e.g.
+  // they iterate over a structure that has <= 2 elements) but this is
+  // not marked in the source code.
+  // Checkers may use this heuristic mark to discard results found on
+  // branches that contain this "weak" assumption.
+  StTrue = recordWeakLoopAssumption(StTrue);
+}
 builder.generateNode(StTrue, true, PredN);
-  else
+  } else {
 builder.markInfeasible(true);
+  }
 }
 
 // Process the false branch.
 if (builder.isFeasible(false)) {
-  if (StFalse)
+  if (StFalse) {
+if (BothFeasible && IterationsFinishedInLoop &&
+*IterationsFinishedInLoop == 0) {

NagyDonat wrote:

I thought about this `value_or` hack, but I think my more verbose approach is 
easier to understand. (By the way, why would you write `1` in the `value_or`? 
Why not `42` or `0xdeadbeef`?)

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;
 }
 if (StTrue && StFalse)
   assert(!isa(Condition));
 
+const Expr *EagerlyAssumeExpr =
+PrevState->get();
+const Expr *ConditionExpr = dyn_cast(Condition);
+if (ConditionExpr)
+  ConditionExpr = ConditionExpr->IgnoreParenCasts();
+bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
+bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
+builder.isFeasible(true) && builder.isFeasible(false);
+
 // Process the true branch.
 if (builder.isFeasible(true)) {
-  if (StTrue)
+  if (StTrue) {
+if (BothFeasible && IterationsFinishedInLoop &&
+*IterationsFinishedInLoop >= 2) {
+  // When programmers write a loop, they imply that at least two
+  // iterations are possible (otherwise they would just write an `if`),
+  // but the third iteration is not implied: there are situations where
+  // the programmer knows that there won't be a third iteration (e.g.
+  // they iterate over a structure that has <= 2 elements) but this is
+  // not marked in the source code.

NagyDonat wrote:

I'm speaking about a situation when they have an array with a static size 2 (or 
3) but the loop condition look like `i < some_struct.num_channels` where the 
upper bound is an opaque value (which is presumably not more than the static 
array size, but this is not checked).

I know that this sounds like a niche situation, but analyzing FFMPEG floods the 
user with dozens of issues like this.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;
 }
 if (StTrue && StFalse)
   assert(!isa(Condition));
 
+const Expr *EagerlyAssumeExpr =
+PrevState->get();
+const Expr *ConditionExpr = dyn_cast(Condition);
+if (ConditionExpr)
+  ConditionExpr = ConditionExpr->IgnoreParenCasts();
+bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
+bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
+builder.isFeasible(true) && builder.isFeasible(false);
+
 // Process the true branch.
 if (builder.isFeasible(true)) {
-  if (StTrue)
+  if (StTrue) {
+if (BothFeasible && IterationsFinishedInLoop &&
+*IterationsFinishedInLoop >= 2) {
+  // When programmers write a loop, they imply that at least two
+  // iterations are possible (otherwise they would just write an `if`),

NagyDonat wrote:

YES, I know about those macros, but they are not relevant here, because there 
the analyzer can see that the condition is not ambiguous, so the heuristic 
implemented in this PR won't be triggered.

This comment is already too long, I don't want to extend it even further for 
irrelevant pedantry like this.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;
 }
 if (StTrue && StFalse)
   assert(!isa(Condition));
 
+const Expr *EagerlyAssumeExpr =
+PrevState->get();
+const Expr *ConditionExpr = dyn_cast(Condition);
+if (ConditionExpr)
+  ConditionExpr = ConditionExpr->IgnoreParenCasts();
+bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
+bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
+builder.isFeasible(true) && builder.isFeasible(false);
+
 // Process the true branch.
 if (builder.isFeasible(true)) {
-  if (StTrue)
+  if (StTrue) {
+if (BothFeasible && IterationsFinishedInLoop &&
+*IterationsFinishedInLoop >= 2) {

NagyDonat wrote:

Because that's the natural value here; see my top-level comment.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;
 }
 if (StTrue && StFalse)
   assert(!isa(Condition));
 
+const Expr *EagerlyAssumeExpr =
+PrevState->get();
+const Expr *ConditionExpr = dyn_cast(Condition);
+if (ConditionExpr)
+  ConditionExpr = ConditionExpr->IgnoreParenCasts();
+bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
+bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
+builder.isFeasible(true) && builder.isFeasible(false);

NagyDonat wrote:

The constructor of `BranchNodeBuilder` can also initialize the data members 
`InFeasibleTrue`/`InFeasibleFalse` to `true` (i.e. mark branches as infeasible) 
when respective `CFGBlock*`s `DstT` and `DstF` are null.

I didn't know exactly when does this happen, so I added the `isFeasible()` 
checks for the sake of safety.  

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;
 }
 if (StTrue && StFalse)
   assert(!isa(Condition));
 
+const Expr *EagerlyAssumeExpr =
+PrevState->get();
+const Expr *ConditionExpr = dyn_cast(Condition);
+if (ConditionExpr)
+  ConditionExpr = ConditionExpr->IgnoreParenCasts();

NagyDonat wrote:

I'm skipping casts to ensure consistent behavior between the 
`eagerly-assume=true` (default) and the `eagerly-assume=false` case.

Perhaps this question warrants more investigation, I'll think about it and 
maybe add a few new testcases.



https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
   std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 else {
   assert(!isa(Condition));
+  // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+  // the common execution path with
+  // StTrue = StFalse = PrevState;
   builder.generateNode(PrevState, true, PredN);
   builder.generateNode(PrevState, false, PredN);
   continue;

NagyDonat wrote:

Good idea, I'll test and use something like this.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -212,6 +212,25 @@ typedef llvm::ImmutableMap
 REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction,
  PendingArrayDestructionMap)
 
+// This trait is used to heuristically filter out results produced from
+// execution paths that took "weak" assumptions within a loop.
+REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool)
+
+ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) {
+  return State->set(true);
+}
+
+bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
+  return State->get();
+}
+
+// This trait points to the last expression (logical operator) where an eager
+// assumption introduced a state split (i.e. both cases were feasible). This is
+// used by the WeakLoopAssumption heuristic to find situations where the an
+// eager assumption introduces a state split within the evaluation of a loop
+// condition.
+REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)

NagyDonat wrote:

I tried to emphasize that I'm only marking the `EagerlyAssume` checks that _did 
assume something new_ (i.e. they introduce a state split). I'll probably rename 
this to `LastEagerAssumptionStateSplitAt` or something similar.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -697,6 +697,11 @@ void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
 ProgramStateRef ErrorState, Messages Msgs,
 NonLoc Offset, std::optional 
Extent,
 bool IsTaintBug /*=false*/) const {
+  // Suppress results found through execution paths where in some loop the
+  // analyzer arbitrarily assumed either that the loop is skipped (0 
iterations)
+  // or that 3 or more iterations are executed.

NagyDonat wrote:

No, the "3" comes from the fact that it's justified to assume that a loop will 
have at least two iterations, but it's not justified to assume that it will 
have at least three iterations (see my top-level comment for explanation and 
details).

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -583,11 +603,11 @@ class ExprEngine {
 ExplodedNode *Pred,
 ExplodedNodeSet &Dst);
 
-  /// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly 
assume symbolic
-  ///  expressions of the form 'x != 0' and generate new nodes (stored in Dst)
-  ///  with those assumptions.
-  void evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, ExplodedNodeSet 
&Src,
- const Expr *Ex);
+  /// evalEagerlyAssumeOpBifurcation - Given the nodes in 'Src', eagerly assume
+  /// symbolic expressions of the form 'x != 0' or '!x' and generate new nodes

NagyDonat wrote:

The name of this method is incorrect and misleading, which wasted my time while 
I was debugging some aspect of my patch, so I want to rename it. I'm not 
interested in renaming other methods, and I think it's a big waste of time to 
create a separate commit for four lines of completely NFC changes. (Also, 
creating a separate patch for every small change is polluting the diff log with 
small inconsequential commits.)

If you have time for a systemic cleanup of this and perhaps other misnamed 
methods, then feel free to create a separate commit for them; otherwise I want 
to do this here.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -323,12 +342,13 @@ class ExprEngine {
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
   ///  nodes by processing the 'effects' of a branch condition.
-  void processBranch(const Stmt *Condition,
- NodeBuilderContext& BuilderCtx,
- ExplodedNode *Pred,
- ExplodedNodeSet &Dst,
- const CFGBlock *DstT,
- const CFGBlock *DstF);
+  /// If the branch condition is a loop condition, IterationsFinishedInLoop is
+  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// std::nullopt.

NagyDonat wrote:

> Can't IterationsFinishedInLoop be a state trait? 

We could introduce it, but we'd need to maintain a nested stack of loops (1 
iteration in that `while`, then 2 iterations in the `for`, then 1 iteration in 
this other `for` in the function called there...) to correctly monitor the 
number of iterations.

This would make the heuristic introduced in this commit slightly more accurate, 
but I think that for practical purposes the current code is already good 
enough, so I didn't invest time into implementing this loop iteration count 
stack. (However it would be a nice follow-up commit if I have time.)

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -212,6 +212,25 @@ typedef llvm::ImmutableMap
 REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction,
  PendingArrayDestructionMap)
 
+// This trait is used to heuristically filter out results produced from
+// execution paths that took "weak" assumptions within a loop.
+REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool)
+
+ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) {
+  return State->set(true);
+}
+
+bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
+  return State->get();
+}

NagyDonat wrote:

This bit doesn't have any "no longer needed" point -- if we made some shaky 
assumption for a loop condition, I want to suppress _all_ the ArrayBoundV2 
reports that are reached through a path which takes that particular branch.

(By the way, each `ExplodedNode` has an associated `State`, which is 
essentially a big heap of traits that are introduced by various parts of the 
engine and various checkers; so yes, traits are contained in the egraph as 
well.)

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can
+/// freely assume either of them. (This is not entirely true, because there may
+/// be unmarked logical correlations between `if` statements, but is a good
+/// enough heuristic and the analyzer strongly relies on it.)
+/// On the other hand, in a loop the state may be split repeatedly at each
+/// evaluation of the loop condition, and this can lead to following "weak"
+/// assumptions even though the code does not imply that they're valid and the
+/// programmer intended to cover them.

NagyDonat wrote:

See my top-level comment for a detailed explanation.

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can

NagyDonat wrote:

I'm speaking about the typical usage of `if`, but this is just a side remark, 
so I didn't want to highlight this (the comment is already too long as it is 
now).

What I mean is that when the analyzer encounters an `if` __and it cannot 
determine whether the condition is true or false__, then it may safely assume 
that both branches are possible. (Which is also not entirely correct, see the 
`` block about "correlated if" problem in my top level comment -- but 
that's not relevant for us here.)

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits


@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete 
*p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+if (GlobalArray[i] > 0)
+  return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}
+
+void loop_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  for (int i = 0; i < len; i++) {
+// When a programmer writes a loop, we may assume that they intended at
+// least two iterations.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+// We should suppress array bounds errors on the third and later iterations
+// of loops, because sometimes programmers write a loop in sitiuations
+// where they know that there will be at most two iterations.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_cast(int len) {
+  int buf[2] = {0};
+  for (int i = 0; (unsigned)(i < len); i++) {
+// Check that a (somewhat arbitrary) cast does not hinder the recognition
+// of the condition expression.
+buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; i < len && flag; i++) {
+// FIXME: In this case the checker should suppress the warning the same way
+// as it's suppressed in loop_suppress_in_third_iteration, but the
+// suppression is not activated because the terminator statement associated
+// with the loop is just the expression 'flag', while 'i < len' is a
+// separate terminator statement that's associated with the
+// short-circuiting operator '&&'.
+// I have seen a real-world FP that looks like this, but it is much rarer
+// than the basic setup.
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; flag && i < len; i++) {
+// If the two operands of '&&' are flipped, the suppression works.
+buf[i] = 1; // no-warning
+  }
+}
+
+int coinflip(void);
+int do_while_report_after_one_iteration(void) {
+  int i = 0;
+  do {
+i++;
+  } while (coinflip());
+  // Unlike `loop_suppress_after_zero_iterations`, running just one iteration
+  // in a do-while is not a corner case that would produce too many false
+  // positives, so don't suppress bounds errors in these situations.
+  return GlobalArray[i-2]; // expected-warning{{Out of bound access to memory}}
+}
+
+void do_while_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  int i = 0;
+  do {
+buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  } while (i++ < len);
+}
+
+void do_while_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  int i = 0;
+  do {
+buf[i] = 1; // no-warning
+  } while (i++ < len);
+}

NagyDonat wrote:

In this case the report is not suppressed, because the heuristic is limited to 
cases where the loop condition is ambiguous. In your example `i < len` 
unambiguously evaluates to true for `i == 0`, `i == 1` and `i == 2` and 
therefore we don't activate the `seenWeakLoopAssumption` bit in the state. 
(I'll probably add a few testcases like this to document and highlight this 
part of the behavior.)

https://github.com/llvm/llvm-project/pull/109804
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-25 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

> If I understand it correctly, we want to use (very likely) incorrect 
> assumptions to suppress false positives produced by an alpha checker, which 
> receives otherwise wrong assumptions. [...] I think the correct solution to 
> this problem is to investigate why the specific checker receives wrong 
> existing assumptions and make those assumptions correct.

No, the situation is significantly more complex.

The "wrong existing assumptions" are caused by a fundamental limitation of the 
path-sensitive analysis approach, namely that we need to follow one concrete 
execution path, which in a loop means modeling one particular iteration count. 
(See below for more detailed explanation.)

To "make those assumptions correct" we would need to implement a complex 
invalidation/widening procedure -- I wasted several months on it and didn't 
manage to make it work. If you write an accurate loop widening logic for the 
analyzer, then after that I'd happily remove the heuristic workaround added by 
this commit, but I don't want to wait years with bringing this checker out of 
the alpha state.

**Justified and wrong of assumptions in the analyzer:**

The analyzer makes _lots_ of assumptions: at each conditional expression, 
short-circuiting operator, loop etc. it must pick one of the multiple branches 
(by the way, this is implemented in `ExprEngine::processBranch()` and the 
related functions).

If the condition of the `if` statement is ambiguous (it can be both true and 
false in the current `State`), then **both choices are justified** because we 
can say that "the programmer wrote an `if` here, so he thinks that both cases 
are possible".

`` This "justification" is not entirely correct because there are 
paranoid defensive checks (and we try to filter them out with heuristics) and 
there are many know false positives that look like
```cpp
void do_something(/* some arguments */, bool flag1, bool flag2) {
  if (flag1) {
// BLOCK1
  }
  // ...
  if (flag2) {
// BLOCK2
  }
}
```
where it's semantically very clear for the programmer that `flag1` and `flag2` 
won't be used together, and the nonsense combination of BLOCK1 + BLOCK2 
triggers some error. (Here the presence of "`if (flag2)`" implies that the 
programmer thinks that `flag2` can be true _in some situations_, but the 
analyzer is wrong when it assumes that `flag2` can be true _on the branch where 
BLOCK1 was executed_.) However this "correlated if" problem is a separate 
unrelated mostly-unsolvable limitation of the analyzer, so we shouldn't discuss 
it here. ``

If the condition of a loop is ambiguous (and e.g remains ambiguous each time 
it's evaluated), then that's a more complex issue: when programmers write a 
loop, then **it's justified to assume that "in some situations there can be at 
least two iterations"** (because otherwise the loop is an overkill and they 
should've just written an `if`) -- but **more concrete assumptions are all 
unjustified**, because the presence of the loop (with the ambiguous condition) 
doesn't imply anything like "there won't be any iterations", "there will be 1 
iteration" or "there will be at least 3 iterations" etc.

Note: your inline comments suggest that the analyzer should assume and analyze 
e.g. the "there won't be any iterations" case unless something (e.g. an 
assertion) rules it out explicitly, but I strongly disagree with this 
philosophy because:
- There are situations where there is no natural place for the `assert()`: e.g. 
if the loop looks like `while (we_have_time() && !algorithm_finished(foo, bar, 
baz)) {...}` where would you put the assertion?
- The analyzer must aim for accepting all normal code, because it's not 
reasonable to demand that when a project starts to use static analyzer, they 
immediately introduce hundreds of assertions to fulfill the demands of our 
tyrannical tool.
Requiring assertions in situations where (e.g.) the zero-iteration case is 
impossible may be an useful design rule, but it's far from being universally 
accepted so we shouldn't enforce it in a general-purpose checker. (If you want 
to implement it as an off-by-default flag, then that would be OK, but I'm not 
interested in covering it.)

So, to summarize these, when we encounter a loop and the evaluations of the 
condition are ambiguous:
- The analyzer architecture can only follow concrete code paths like "skip the 
loop", "perform 1 iteration then leave", "perform 2 iterations then leave", 
..., "perform 6457 iterations then leave" etc.
- It's (mostly - see the offtopic block near the beginning) justified to stay 
in the loop for 2 iterations, because if the programmer expected just one 
iteration, then they would've written an `if` instead of a loop.
- It's not justified to stay in the loop for more than two iterations, because 
the normal way to implement "loop that can execute $\le 2$ iterations" is the 
same as "loop that can execute many iterations" (and no, you cannot demand that 
short

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-24 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

Instead of the loop widening plans that I discussed earlier (e.g. on discourse) 
I ended up implementing this suppression heuristic, which is currently in a 
"minimal stable product" state: it is working and ready to be merged IMO, but 
I'm also open to suggestions about technical improvements and generalizations.

This change significantly reduces the amount of ArrayBoundV2 false positives, 
for example on FFMPEG (the project where I've seen the most results) **the 
number of ArrayBoundV2 results is reduced from 316 to 80**.

I hope that after this change ArrayBoundV2 can be brought out of the alpha 
state, because the new result count is comparable to stable checkers:  e.g. on 
ffmpeg where ArrayBoundV2 produces 80 results, there are 166 
core.NullDereference results, 135 core.UndefinedBinaryOperatorResult results 
and 120 core.uninitialized.Assign results.

My first impression is that the remaining ArrayBoundV2 results are still mostly 
false positives (which is not surprising because these are stable open source 
projects), but I didn't find any "typical issue" among the first 20 results (on 
ffmpeg) that I investigated.

The following diff shows the effect of enabling ArrayBoundV2 (in addition to 
the core, cplusplus, nullability, unix and valist checkers, which are enabled 
on both sides of the diff).
| Project | New Reports | Resolved Reports |
|-|-|--|
| memcached | [2 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ericsson-weak-loop-assumptions_8890e0a&newcheck=memcached_1.6.8_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [0 resolved 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ericsson-weak-loop-assumptions_8890e0a&newcheck=memcached_1.6.8_ericsson-weak-loop-assumptions_742b07a&diff-type=Resolved)
 
| tmux | [1 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=tmux_2.6_ericsson-weak-loop-assumptions_8890e0a&newcheck=tmux_2.6_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [0 resolved 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=tmux_2.6_ericsson-weak-loop-assumptions_8890e0a&newcheck=tmux_2.6_ericsson-weak-loop-assumptions_742b07a&diff-type=Resolved)
 
| curl | [4 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=curl_curl-7_66_0_ericsson-weak-loop-assumptions_8890e0a&newcheck=curl_curl-7_66_0_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [1 resolved 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=curl_curl-7_66_0_ericsson-weak-loop-assumptions_8890e0a&newcheck=curl_curl-7_66_0_ericsson-weak-loop-assumptions_742b07a&diff-type=Resolved)
 
| twin | [12 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ericsson-weak-loop-assumptions_8890e0a&newcheck=twin_v0.8.1_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [1 resolved 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ericsson-weak-loop-assumptions_8890e0a&newcheck=twin_v0.8.1_ericsson-weak-loop-assumptions_742b07a&diff-type=Resolved)
 
| vim | [38 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ericsson-weak-loop-assumptions_8890e0a&newcheck=vim_v8.2.1920_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [1 resolved 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ericsson-weak-loop-assumptions_8890e0a&newcheck=vim_v8.2.1920_ericsson-weak-loop-assumptions_742b07a&diff-type=Resolved)
 
| openssl | [18 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=openssl_openssl-3.0.0-alpha7_ericsson-weak-loop-assumptions_8890e0a&newcheck=openssl_openssl-3.0.0-alpha7_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [0 resolved 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=openssl_openssl-3.0.0-alpha7_ericsson-weak-loop-assumptions_8890e0a&newcheck=openssl_openssl-3.0.0-alpha7_ericsson-weak-loop-assumptions_742b07a&diff-type=Resolved)
 
| sqlite | [5 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ericsson-weak-loop-assumptions_8890e0a&newcheck=sqlite_version-3.33.0_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [1 resolved 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ericsson-weak-loop-assumptions_8890e0a&newcheck=sqlite_version-3.33.0_ericsson-weak-loop-assumptions_742b07a&diff-type=Resolved)
 
| ffmpeg | [80 new 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=ffmpeg_n4.3.1_ericsson-weak-loop-assumptions_8890e0a&newcheck=ffmpeg_n4.3.1_ericsson-weak-loop-assumptions_742b07a&diff-type=New)
 | [16 resolved

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

2024-09-24 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat created 
https://github.com/llvm/llvm-project/pull/109804

The checker alpha.security.ArrayBoundV2 produced lots of false positives in 
situations where loop modeling of the engine fed it with unfounded assumptions.

This commit introduces a heuristic that discards ArrayBoundV2 reports when the 
execution path introduces an assumption that is questionable. More precisely, 
two kinds of assumptions are categorized as "weak":
1. When the analyzer assumes that the first evaluation of the loop condition 
returns false and the loop body is completely skipped.
2. When the analyzer assumes that the loop condition is true in a situation 
where it already executed (at least) two iterations.

For examples and more explanation, see the new tests.

The actual implementation uses some approximations (it uses the BlockCount 
instead of the iteration count) because that seems to be "good enough" for this 
heuristical suppression.

Note that I used minor state updates instead of bug reporter visitors because 
the number of finished iterations is not visible in the visitor which "walks 
backwards in time".

As a very minor unrelated change, this commit removes the "Bin" part from the 
method name "evalEagerlyAssumeBinOpBifurcation" because this method is also 
used for the unary logical not operator.

From 23b27377e556085054621f27d97059618b416695 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 23 Sep 2024 15:42:20 +0200
Subject: [PATCH 1/2] [analyzer] Suppress out of bounds reports after weak loop
 assumptions

The checker alpha.security.ArrayBoundV2 produced lots of false positives
in situations where loop modeling of the engine fed it with unfounded
assumptions.

This commit introduces a heuristic that discards ArrayBoundV2 reports
when the execution path introduces an assumption that is questionable.
More precisely, two kinds of assumptions are categorized as "weak":
(1) When the analyzer assumes that the first evaluation of the loop
condition returns false and the loop body is completely skipped.
(2) When the analyzer assumes that the loop condition is true in a
situation where it already executed (at least) two iterations.
For examples and more explanation, see the new tests.

The actual implementation uses some approximations (it uses the
BlockCount instead of the iteration count) because that seems to be
"good enough" for this heuristical suppression.

Note that I used minor state updates instead of bug reporter visitors
because the number of finished iterations is not visible in the visitor
which "walks backwards in time".

As a very minor unrelated change, this commit removes the "Bin" part
from the method name "evalEagerlyAssumeBinOpBifurcation" because this
method is also used for the unary logical not operator.
---
 .../Core/PathSensitive/ExprEngine.h   |  43 ++--
 .../Checkers/ArrayBoundCheckerV2.cpp  |   5 +
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  |  25 -
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  |  83 +++---
 clang/test/Analysis/loop-unrolling.cpp|   6 +-
 clang/test/Analysis/out-of-bounds.c   | 101 ++
 6 files changed, 233 insertions(+), 30 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 04eacd1df7ffe2..03e4fec2b49015 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can
+/// freely assume either of them. (This is not entirely true, because there may
+/// be unmarked logical correlations between `if` statements, but is a good
+/// enough heuristic and the analyzer strongly relies on it.)
+/// On the other hand, in a loop the state may be split repeatedly at each
+/// evaluation of the loop condition, and this can lead to following "weak"
+/// assumptions even though the code does not imply that they're valid and the
+/// programmer intended to cover them.
+/// This function is called to mark the `State` when the engine makes an
+/// assumption which is weak. Checkers may use this heuristical mark to discard
+/// result and reduce the amount of false positives.
+ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
+
+/// Returns true if `recordWeakLoopAssumption()` was called on the execution
+/// path which produced `State`.
+bool seenWeakLoopAssumption(ProgramStateRef State);
+
 class ExprEngine {
   void anchor();
 
@@ -323,12 +342,13 @@ class ExprEngine {
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
 

[clang] [analyzer][StackAddrEscapeChecker] Fix assert failure for alloca regions (PR #109655)

2024-09-23 Thread Donát Nagy via cfe-commits


@@ -1,4 +1,9 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify 
%s -Wno-undefined-bool-conversion
+// RUN: %clang_analyze_cc1 \
+// RUN:   -analyzer-checker=core,debug.ExprInspection,unix.Malloc \
+// RUN:   -verify %s \
+// RUN:   -Wno-undefined-bool-conversion
+// unix.Malloc is necessary to model __builtin_alloca,
+// which could trigger an "unexpected region" bug in StackAddrEscapeChecker.

NagyDonat wrote:

Malloc / etc. modeling should not be a dependency of `StackAddrEscapeChecker` 
(and  in particular the `unix.Malloc` part of `MallocChecker` cannot be a 
strong dependency of anything, because it can generate reports, while only 
modeling checkers may be dependencies of other checkers). 

I don't think that we __need__ a separate testcase to see that 
`StackAddrEscapeChecker` does not crash on `__builtin_alloca` when 
`unix.Malloc` is not enabled -- the reason for this crash was that 
`unix.Malloc` introduced an unusual region type (`AllocaRegion`) and without 
that modeling the return value of `__builtin_alloca` is just a generic conjured 
region which should be already covered by the many other testcases. However, 
I'm not opposed to adding that testcase if you would like to see it.

https://github.com/llvm/llvm-project/pull/109655
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Use {} instead of std::nullopt to initialize empty ArrayRef (PR #109399)

2024-09-20 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat commented:

StaticAnalyzer changes LGTM.

https://github.com/llvm/llvm-project/pull/109399
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Indicate UnarySymExpr is not supported by Z3 (PR #108900)

2024-09-19 Thread Donát Nagy via cfe-commits


@@ -278,6 +278,13 @@ class SMTConstraintManager : public 
clang::ento::SimpleConstraintManager {
 if (const SymbolCast *SC = dyn_cast(Sym))
   return canReasonAbout(SVB.makeSymbolVal(SC->getOperand()));
 
+// If a UnarySymExpr is encountered, the Z3
+// wrapper does not support those. So indicate Z3 does not
+// support those and return.

NagyDonat wrote:

```suggestion
// UnarySymExpr support is not yet implemented in the Z3 wrapper.
```



https://github.com/llvm/llvm-project/pull/108900
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Indicate UnarySymExpr is not supported by Z3 (PR #108900)

2024-09-19 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/108900
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Indicate UnarySymExpr is not supported by Z3 (PR #108900)

2024-09-19 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat approved this pull request.

LGTM with some additional bikeshedding in a comment that could be shorter IMO.

https://github.com/llvm/llvm-project/pull/108900
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Note last "fclose" call from "ensureStreamOpened" (PR #109112)

2024-09-18 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/109112
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Note last "fclose" call from "ensureStreamOpened" (PR #109112)

2024-09-18 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat approved this pull request.

I like this improvement :smile:, thanks for implementing it!

The code that you added LGTM, my only concern is the awkward warning message 
which was inherited from the earlier state of the codebase. I think it would be 
nice to slip in an "attack of opportunity" that improves it, but I can also 
change that message in a follow-up commit if you don't want to bother with it. 

https://github.com/llvm/llvm-project/pull/109112
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Note last "fclose" call from "ensureStreamOpened" (PR #109112)

2024-09-18 Thread Donát Nagy via cfe-commits


@@ -1849,11 +1889,12 @@ ProgramStateRef StreamChecker::ensureStreamOpened(SVal 
StreamVal,
   if (SS->isClosed()) {
 // Using a stream pointer after 'fclose' causes undefined behavior
 // according to cppreference.com .
-ExplodedNode *N = C.generateErrorNode();
-if (N) {
-  C.emitReport(std::make_unique(
+if (ExplodedNode *N = C.generateErrorNode()) {
+  auto R = std::make_unique(
   BT_UseAfterClose,
-  "Stream might be already closed. Causes undefined behaviour.", N));
+  "Stream might be already closed. Causes undefined behaviour.", N);

NagyDonat wrote:

```suggestion
  "Use of a stream that might be already closed", N);
```
As a somewhat offtopic remark, I'd suggest that you should replace this warning 
message because it's awkward and not phrased like the "usual" messages from the 
majority of the checkers.

Feel free to tweak my suggestion -- it's probably possible to find a slightly 
better wording.

I kept that the message says "might", which is presumably there because on 
other execution paths the stream might be still open -- but there are many 
other checkers which don't bother with this ambiguity and would boldly report 
"Use of a stream that is already closed" at this point.

https://github.com/llvm/llvm-project/pull/109112
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Indicate UnarySymExpr is not supported by Z3 (PR #108900)

2024-09-17 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/108900
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Indicate UnarySymExpr is not supported by Z3 (PR #108900)

2024-09-17 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/108900
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   >